Sie sind auf Seite 1von 42

Voice XML- Voice Markup Language

Presented by
Hongliang Xu

Presentation Overview
What

is VoiceXML ?
Introduction to VoiceXML
Overview of VoiceXML
A Sample VoiceXML Application
Summary
History

What is VoiceXML ?
VoiceXML is a standard for voicebased
communication. VoiceXML is an XML language,
which plays the role of the language of
communication in voice application, similar to the
role played by HTML in web application.
Also, like other XML technologies, VoiceXML
seamlessly integrates with existing web-based
technologies and can be used with any existing
server-side technology such as ASP, Java
servelets.

Goal of VoiceXML
VoiceXMLs main goal is to bring the full power
of web development and content delivery to
voice response applications, and to free the
authors of such applications from low-level
programming and resource management.

Concepts
Dialogs

and Subdialogs
Sessions
Grammars
Events
Links
Application

Introduction to VoiceXML
Two

Short examples
The advantages of using VoiceXML
The architectural design of VoiceXML
Elements of the VoiceXML
Implementation

Hello World Example


Here are two short examples of VoiceXML. The first is the venerable Hello World:
<?xml version="1.0"?>
<vxml version="1.0">
<form>
<block>Hello World!</block>
</form>
</vxml>
The top-level element is <vxml>, which is mainly a container for dialogs. There are
two types of dialogs: forms and menus. Forms present information and gather
input; menus offer choices of what to do next. This example has a single form,
which contains a block that synthesizes and presents Hello World! to the user.
Since the form does not specify a successor dialog, the conversation ends.

Another Short Example


The second example asks the user for a choice of drink and then submits it to a
server script:
<?xml version="1.0"?>
<vxml version="1.0">
<form>
<field name="drink">
<prompt>Would you like coffee, tea, milk, or nothing?</prompt>
<grammar src="drink.gram" type="application/x-jsgf"/>
</field>
<block>
<submit next= "http://www.drink.example/drink2.asp"/>
</block>
</form>
</vxml>

The advantages of using VoiceXML

Minimizes client/server interactions by specifying multiple


interactions per document
Shields application developers from low-level and platform
specific details
Separates the user interaction code(which is given in
VoiceXML) from service logic
Promotes service portability across implementation platforms.
VoiceXML is a common language for content providers, tool
providers and platform providers.
Is easy to use for simple interactions, and yet provide language
features to support complex dialogs

The architectural design of VoiceXML


Voice
User

Translation
process

VoiceXML
Client
(e.g. PC)
Content
Server

Client
(e.g. mobile
phone)

VoiceXML
Gateway

Elements of the VoiceXML Implementation


Document Server
request

Document

Interceptor
Context

interceptor

Implementation Platform

Overview of VoiceXML
Grammar
Structure of VoiceXML Applications
Different Types of Forms
A Form Dialog
A Menu Dialog
Sub Dialogs within a Form
Transition between Dialogs
Variables in VoiceXML Dialogs
Event Handling in VoiceXML

Grammar
A grammar defines the allowable inputs submitted by the user.
The <grammar> element is used to provide a speech grammar
that
specifies a set of utterances that a user may speak to perform an
action or supply information, and
provides a corresponding string value (in the case of a field
grammar) or set of attribute-value pairs (in the case of a form
grammar) to describe the information or action.
The <grammar> element is designed to accommodate any
grammar format that meets these two requirements. At this time,
VoiceXML does not specify a grammar format nor require support
of a particular grammar format. This is similar to the situation with
recorded audio formats for VoiceXML, and with media formats in
general for HTML.

Structure of VoiceXML applications


Root Document

Document1

Document2

Dialog 1

Document3

Dialog 2

Subdialog 1

Document4

Dialog 3

Subdialog 2

Forms contains
A set of form items. Form items are subdivided
into field items, those that define the forms
field item variables, and control items
Declarations of non-field item variables.
Event handlers.
Filled actions, blocks of procedural logic that
execute when certain combinations of field
items are filled in.
Form attributes are id and form.

Different Types of Forms


Directed

Forms
Mixed Initiative Forms

Directed Forms
The simplest and most common type of form is one in which the form
items are executed exactly once in sequential order to implement a
computer-directed interaction. Here is a weather information service
that uses such a form.
<form id="weather_info">
<block>Welcome to the weather information
service.</block>
<field name="state">
<prompt>What state?</prompt>
<grammar src="state.gram"
type="application/x-jsgf"/>
<catch event="help">
Please speak the state for which you want the weather.
</catch>
</field>
<field name="city">

Directed Forms(continued)
<grammar src="city.gram"
type="application/x-jsgf"/>
<catch event="help">
Please speak the city for which you want the weather.
</catch> </field>
<block>
<submit next="/servlet/weather" namelist="city state"/>
</block>
</form>
This dialog proceeds sequentially:
C (computer): Welcome to the weather information service. What state?
H (human): Help
C: Please speak the state for which you want the weather.
H: Georgia
C: What city?
H: Tblisi
C: I did not understand what you said. What city?
H: Macon

Mixed Initiative Forms


Directed forms implementing rigid, computer-directed conversations.
To make a mixed initiative form, where both the computer and the
human direct the conversation, it must one or more <initial> form
items and one or more form-level grammars.
If a form has form-level grammars:
Its fields can be filled in any order.
More than one field can be filled as a result of a single user
utterance.
Also, the forms grammars can be active when the user is in other
dialogs. If a document has two forms on it, say a car rental form and
a hotel reservation form, and both forms have grammars that are
active for that document, a user could respond to a request for hotel
reservation information with information about the car rental, and
thus direct the computer to talk about the car rental instead. The
user can speak to any active grammar, and have fields set and
actions taken in response.

A Form Dialog
Fields
Recording

input
Blocks and objects

Fields
A field specifies an input item to be gathered from the user.
The <field> type attribute is used to specify a built-in grammar for one
of the fundamental types, and also specifies how its value is to be
spoken if subsequently used in a value attribute in a prompt. An
example:
<field name="lo_fat_meal" type="boolean">
<prompt>
Do you want a low fat meal on this flight?
</prompt>
<help>
Low fat means less than 10 grams of fat, and under
250 calories.
</help>
<filled>
<prompt>
I heard <emp><value expr="lo_fat_meal"/></emp>.
</prompt>
</filled>
</field>

Fields(continued)
In this example, the boolean type indicates that inputs are various forms
of true and false. The value actually put into the field is either true or
false. The field would be read yes or no in prompts.
It is important that there be input conventions for each built-in type, so
that, for instance, generic prompt and help messages can be written
that apply to all implementations of VoiceXML. These are localedependent, and a certain amount of variability is allowed. For example,
the boolean types grammar should minimally allow yes and no
responses, but each implementation is free to add other choices, such
as yeah and nope. In cases where an application requires a different
behavior, it should use explicit field grammars.
In addition, each built-in type has a convention for the format of the
value returned. These are independent of locale and of the
implementation. The return type for built-in fields is string except for the
boolean field type.

Recording input
As we know, <field> element is used for collecting user
input. The <record> element is used to record the user
input as voice. We can think of the <field> element as
analogous to a human speaking with another human,
while the <record> element is more analogous to a voice
mailbox.
The user is prompted for a greeting and then records it.
The greeting is played back, and if the user approves it, is
sent on to the server for storage using the HTTP POST
method. Notice that like other field items, <record> has
prompts and catch elements. It may also have <filled>
actions. If the platform supports simultaneous recognition
and recording, form and document scoped grammars can
be active while the recording is in progress.

Blocks
The <block> element contains sequence of
procedural statements used for prompting and
computation, but not for gathering input. The
<block> element is executed based on the
value of the field variable associated with the
block. Its a control item.
A block has a (normally implicit) form item
variable that is set to true just before it is
interpreted.

Object
Objects are used to declare and execute sections of
code. The <object> element is very similar to the HTML
<OBJECT> element.
The <object> item invokes a platform-specific "object"
with various parameters. The result of the platform
object is an ECMAScript Object with one or more
properties. One platform object could be a built-in
dialog that gathers credit card information. Another
could gather a text message using some proprietary
DTMF text entry method. There is no requirement for
implementations to provide platform-specific objects,
although support for the <object> element is required.

Menu Dialog
The menu dialog in voiceXML is intended to allow the user to
make a selection from a set of choices. For example , in a
voiceXML music shopping portal, the application asks the user if
they want to buy titles on classical music, pop or rock using a
menu. This menu leads the user to another dialog which will
depend on their selection. A simple example is given below:
<menu>
<prompt>
Welcome to Music Portal. Select the category you want to go to:
<enumerate />
<choice next =http://www.mobilestore.com/vxml/classical.vxml>
Classical
</choice>
<choice next =http://www.mobilestore.com/vxml/pop.vxml>
POP
</choice>
<choice next =http://www.mobilestore.com/vxml/rock.vxml>
Rock
</choice>

Menu Dialog(continued)
<noinput>
Please make a choice
<enumerate />
</ noinput>
</menu>

<enumerate /> list all available options to the user. This code
essentially defines a prompt and enumberates through the
choices. In the example above each choice directs the user to a
different sectionof the application through a hyperlink. The final
section defines the promptto be spoken if the timeout value is
exceeded. If the user does not speak within a specified time, the
application reiterate the choices.
The entire menu-type dialog is enclosed within a <menu>
element. Inside the <menu> element there are other elements
intended to prompt the user for input and to collect this input.

Sub Dialogs within a Form


A subdialog in a form is a dialog within the scope
of another dialog. In addition , subdialogs are
helpful for giving structure to a dialog.
For example, complex logic and input fields in a
dialog cab be split off into subdialogs. It will be
easier to manage many fields and complex
entries if a subdialog mechanism is used .
Subdialogs are linked to a main dialog using the
<subdialog> element

Transition between Dialogs


A VoiceXML application consists of a set of dialogs ,and that the
dialogs can submit data to other dialogs or a document. In Practice
The information is passed between documents in the conetnt server.
The two elements used for transferring execution between forms or
documents are <goto> and < submit>
The <goto> element is used to;
transition to another form item in the current form,
transition to another dialog in the current document, or
transition to another document.
The different attributes used by <goto> are listed below:
Next

The URI to which to transition.

Expr

An ECMAScript expression that yields the URI.

Nextitem The name of the next form item to visit in the current form.

Transition between Dialogs


Expritem An ECMAScript expression that yields the name of
the next form item to visit.
The <submit> element is similar to <goto> in that it results in a
new document being obtained. Unlike <goto>, it lets you submit
a list of variables to the document server via an HTTP GET or
POST request. Commonly used attributes besides next and
expr are listed below:
namelist
The list of variables to submit. By default, all the named field
item variables are submitted. If a namelist is supplied, it may
contain individual variable references which are submitted with
the same qualification used in the namelist.
Method

The request method: get (the default) or post.

enctype
The MIME encoding type of the submitted document. The
default is application/x-www-form-urlencoded. Interpreters may

Variables in VoiceXML Dialogs


Variables are declared by <var> elements. This
element is named using the name attribute can
be set with a default or initial value using the
expr attribute.
Variables are also declared by form items, like
<field> and <record>. VoiceXML variables are in
all respects equivalent to ECMAScript variables.
The variable naming convention is as in
ECMAScript, but names beginning with the
underscore character (_) are reserved for
internal use.

Variable Scope hierarchy


The curved arrows in this
diagram show that each
scope contains a variable
whose name is the same as
the scope that refers to the
scope itself. This allows you
for example in the
anonymous, dialog, and
document scopes to refer to
a variable in the document
scope using document.

Event Handling in VoiceXML

Application

Interceptor Context

Platform

User

A Sample VoiceXML Application


This sample intents to show how VoiceXML works and also to
illustrate the difficulties of voice browsing.
This applications function is to provide information about
different geographical locations around the world.
The application does two things:
1.

Finds a place if a latitude and longitude are entered

2.

Finds the latitude and longitude of a particular place

The root document geo.vxml. Which acts as a startup form


<?xml version=1.0?>
<vxml version=1.0>
<form>
<block>
Welcome to Geoplanet
<goto next=geomain.vxml/>
</block>
</form>

A Sample VoiceXML Application


<?xml version=1.0?>
<vxml version=1.0>
<menu>
<prompt>
Please select the category you
want to use
<enumerate/>
<choice next=getlat.vxml>
find a place
< /choice>
<noinput>
Please make a choice
<enumerate/>
</noinput>
</menu>
</vxml>

After giving a greeting,


the root document
directs the user to
geomain.vxml, where
the user makes the
choice of whether to
find a place by latitude
and longitude or to get
the latitude and
longitude of a particular
place

A Sample VoiceXML Application


<?xml version=1.0?>
<vxml version=1.0>
<form>
<field name=latitude type=number>
<prompt>
Please give the latitude
</prompt>
</field>
<field name=longitude type=number>
<prompt>

If the user decides


to find a place by
giving a latitude and
longitude, they are
sent to
findplace.vxml

Please give the longitude


</prompt>
</field>
<filled namelist=latitude longitude>
<submit
next=http://localhost:8080/servlet/get_place/
>
</filled>
</form>

A Sample VoiceXML Application


<! getlat.vxml -- >
<?xml version=1.0?>
<vxml version=1.0 application=geo1.vxml>
<form>
<field name=place>
<prompt>
Please say the name of a place
</prompt>
<grammar src=place.gram type
application/x-jsgf/>
</field>
<block>
<submit
next==http://localhost:8080/servlet/get_place/
>
</block>
</form>
</vxml>

Summary

VoiceXML applications consist of many documents, each of


which contains dialogs.These dialogs take the form of menus or
forms. Forms can be constructed in different schemes direct or
mixed initiative.
The implementation of VoiceXML can be either telephony-based
or browser-based. In the case of telephony-based
implementation, the user connects to a VoiceXML gateway
through the telephone,which itself connects to a content server.
However, for browser-based implementations the user interacts
directly with a browser that connect to the content server. The
bowser or gateway interacts with a TTS(Text to Speech)engine
and speech recognition engine, which may be implemented in
hardware or software to render VoiceXML dialogs to the user.

Next Generation Interface


Voice Browsing: Design and limitations
Voice browsing in its fullest sense may not be
acceptable to users for a number of reasons: user
cannot retain different menu items for long; users will
always prefer familiar types of conversations rather
than automated formal ones.
Hence, VoiceXML application should be based on
natural language dialog, and not be a simple
translation of the functionality of graphical-beased
user interface to voice-based interface to be
successful.

Revision History
Version
0.9

Date

Description

17 Aug 1999 Initial release. Provided as baseline in


support of comment period from supporters.

1.0 RC

02 Mar 2000

Release Candidate (released to Forum


Supporters)

1.0

07 Mar 2000

Released to public - editorial corrections


from 1.0 RC

Resources
VoiceXML

Forum
http:www.voicexml.org/
VoiceXML Reference
http://studio.tellme.com/voicexmlref/
W3C VoiceXML version 1.0.
Professional WAP, Wrox Press,2000

Thank You !

Das könnte Ihnen auch gefallen