VXML g9104

Speech Technologies and VoiceXML
Guided byMr R.P.Ojha
y y
Voice technologies Backgrounds

ASR/TTS
Voice browsing with VoiceXML y VoiceXML architecture y VoiceXML Programming y Future of VoiceXML y Summary
Presentation Agenda
In the mid- to late 1990s, personal computers started to become powerful enough to support ASR y The two key underlying technologies behind these advances are speech recognition (SR) and text-to-speech synthesis (TTS).
y
Voice Technologies
Speech Recognition
)
Speech Synthesis
E-business has changed from client-server model to web-centric model y Once connect to the Internet,one can get any information he want. But people wants more convenient way to connect to Internet. y Lou Gerstner,CEO of IBM:Pervasive Computing Model is billion people interacting with million e-business with trillion devices interconnected.
y
Pervasive Computing Model
VoiceXML instead of HTML y A voice browser instead of an ordinary web browser y Phone instead of PC.
y
Voice Browsing
Speech Input: speech recognition and DTMF y Speech Output: pre-recorded audio and synthesized speech y Internet: XML, IP, HTTP, SSL, JavaScript y Telephony: call transfer, data passing
y
VoiceXML Key Design Issues
Founded May 1999 y 60 company members y Mission Standards group to prepare and review markup languages to enable internet-based speech applications y http://www.w3.org/Voice
y
W3C Voice Browser Working Group
Industry Group to promote VoiceXML y 550+ member companies y Submitted VoiceXML 1.0 to W3C in May 2000 y http://www.voicexml.org
y
VoiceXML Forum
VoiceXML v1.0 (May 2000)

VoiceXML Forum Specification submitted to the W3C
VoiceXML v2.0
W3C Voice Browser Working Group 50+ members collaborating Addressed 400+ change requests
y y
y y y
A language for specifying voice dialogs. Voice dialogs use audio prompts and text-tospeech (TTS) for output; touch-tone keys (DTMF) and automatic speech recognition (ASR) for input. Main input/output device (initially) is the phone. Leverages the Internet for application development and delivery. Standard language enables portability.(VoiceXML Dialog )
VoiceXML Overview
VoiceXML Platform Architecture
Telephone and Platform VoiceXML Telephone networkConnects callers telephone with Architecture-1 ArchitectureTelephony Server y VoiceXML Gateway
y
Voice Browser Audio input-Speech Recognition (ASR), Touchtone (DTMF), Audio recording. Audio output-Audio playback, Speech Synthesis (TTS) Interface, Call Controls
VoiceXML Platform Architecture-2 Architecturey
VoiceXML Documents
Dialog and flow control Client-side scripting (ECMAScript) Speech Recognition grammar Speech Synthesis pronunciation control
Document servers(web server)

Feeding Static VoiceXML documents or audio files.
Application servers
Generate VoiceXML documents dynamically. Server-side application logic Connect to Database, or database interface
Example and weather.jsp - VoiceXML

JSP
<% user.storePreference( try) %> <form> <block> <%= weather.getTemp() %> </block> </form> <form> <block> 25 </block> </form>
VoiceXMLbrowser
DB
Web server+ Servlet/JSP engine
Voice Gateway
y In Taiwan: Implementations of VoiceXML Yes Mobile GatewaysTelecom Laboratories ( Chunghwa )
eWings Technologies, Inc

y y
Free
IBM VoiceServerSDK
Open Source
CMU:OpenVXI
[DEMO] A Simple VoiceXML Application
DEMO
A Simple VoiceXML application to introduce the department of Computer Science . y Exp. show that to build a corresponding HTML version first is helpful.
y
Document
A VoiceXML document defines one or more dialogs The user is always in one dialog at any time Each dialog specifies the next dialog to transition to using a URL
doc1.vxml Dialog 1
Transition: #dialog 2
Dialog 2
Transition: http://xyz.com/doc2.vxml
A Dialog describes an interaction between a user and the system y Two kinds of dialogs: form and menu
y
Dialog
VoiceXML Document Structure.
Form
Grammar
filed
<form> <field name="travellers> <grammar mode=voice src=./number.grxml/> <prompt>How many are travelling?</prompt> <filled> <submit next=http://travel.com/order/> </filled> </field> </form>
input output
eval
Form
<menu id=commands> What service would you like? <choice next=/cars> <choice next=/news> </menu> Car hire Todays news </choice> </choice> <choice next=/hotels> Hotel reservations </choice>
menu menu
form user URL
Menu
Typically used to send results from client to server y Syntax: <submit next=URI namelist=var1 var2 .../> y namelist: Fields
y
Submit
Submit, Example
<form> <field name=dest-city"> <prompt> Where do you want to go to? </prompt> <grammar mode=voice src=./cities.grxml/> </field> <field name="travellers> <prompt> How many are travelling to <value expr="city"/>? </prompt> <grammar mode=voice src=./number.grxml/> </field> <filled> Thank you. Your order is now being processed. <submit next="http://travel.com/order" namelist=dest-city travellers"/> </filled> </form>
Variables can be manipulated and referenced
: <field name="user2"> : <assign name="user1" expr=peter"/> : <clear namelist="user1 user2"/> : How many are travelling to <value expr=dest-city/> ?
x $
Variables
session application document dialog
Session variables are read-only variables provided by the interpreter context
Variable Scope Search for variable name
Scope defined by element containing executable content (<block>, <filled> or event handler)
:Events
Events are used to signal unexpected situations y Events are caught by an catch event handler
y
<catch event=com.acme.mailreader>...</catch> <catch event=nomatch noinput>...</catch> Shortcut: <nomatch> is equivalent to <catch event="nomatch"> Other shortcuts: <noinput>, <error>
<field name=dest-city"> <prompt> Where do you want to go to? </prompt> <grammar mode=voice src=./cities.grxml/> <nomatch> Please say the city you want to fly to. </nomatch> </field>
Events, Example
xHTML + VoiceXML y SALT

y
Multimodal Web Browsing
[DEMO] Multimodal Browsing
Sun/SpeechWorks (1999)
W3C
JSML JSGF
VoiceXML forum (2000) W3C (2003 in CR)
VoiceXML 3? Speech synthesis (SSML) Speech reco. grammar Speech semantics NLP Pronunciation lexicon [early] Call control [early] [early]
VoiceXML 1.0
VoiceXML 2.0
Future of the Voice web and Microsoft-led (2002) VoiceXML

SALT
Speech Application Language Tags
Voice Browser interoperation
Speech is the most natural way for human to communicate thus it will become an important way in HCI. y VoiceXML has revolutionized speech recognition & telephony application development & deployment.
y
Conclusion
Q&A
Backup
History of VoiceXML
Source:VoiceXML forum(http://www.voicexml.org)
Show : VoiceXML in Daily Life
Classification of Voice Basic interactive voice response (IVR)

Computer: For stock quotes, press 1. For Application trading, press 2. Human: (presses DTMF 1)
Basic speech ASR

C: Say the stock name for a price quote. H: Lucent Technologies
Advanced speech ASR Classification of
Voice
C: Stock Services, how may I help you? Application H: Uh, whats Lucent trading at?
y
Near-natural language ASR

C: How may I help you? H: Um, yeah, Id like to get the current price of Lucent Technologies C: Lucent is up two at sixty eight and a half. H: OK. I want to buy one hundred shares at market price. C:
Capturing speech (analog) signals y Digitizing the sound waves, converting them to basic language units or phonemes, y Constructing words from phonemes, and contextually analyzing the words to ensure correct spelling for words that sound alike (such as write and right).
y
Speech Recognition
Speech Synthesis, or text-to-speech, is the process of converting text into spoken language.
Breaking down the words into phonemes; Analyzing for special handling of text such as numbers, currency amounts. Generating the digital audio for playback.
Speech Synthesis
VoiceXML Gateway(detail)
Writing a VoiceXML application is programming. y Control constructs are procedural (if-else etc.) y VoiceXML platform iterates through a <form> until values for all field items have been collected
y
Programming VoiceXML
VoiceXML System Components

PBX Telecom boards VoiceXML server Software utilities
Speech synthesis (TTS) Speech recognition (SR) Speech grammars Voice Biometrics
Call centre
VoiceXML servers serve as integrators of various hardware and software

CT Integration
The FIA has a main loop that repeatedly selects a form item and then visits it The first (in document order) form item, whose field item variable is undefined, is selected As a result, the user is prompted for each field item in turn
FIA - Form Interpretation Algorithm
<form> <prompt>Where do you want to go to and how many are travelling ?</prompt> <field name=dest-city"> <prompt>Where do you want to go to?</prompt> <grammar mode=voice src=./cities.grxml/> </field>
Field item 1
<field name="travellers> <prompt>How many are travelling to your destination?</prompt> Field item 2 <grammar mode=voice src=./number.grxml/> </field>  Example Form </form>
<form> ... <filled> <if cond="travellers > 10">

Sorry, we cannot handle groups larger than 10 persons
<clear namelist="travellers"/> <elseif cond="travellers > 5 && dest-city == 'London'"/>

Sorry, we cannot handle groups larger than 5 persons travelling to London
<clear namelist=city travellers"/> <else/> <submit next="http://travel.com/order"/> </if> </filled> </form>
if, else and elseif
Developed by Sun and SpeechWorks, as a markup language for text-to-speech dialogs. Based on the Java Speech API Markup Language http://java.sun.com/products/java-media/speech/ Text annotation to provide hints to speech synthesizers
Aimed at making TTS speech more natural, more understandable
Feature set:
hints to word pronunciation hints to phrasing, emphasis, pitch and speaking rate marker elements -- notifications from the speech synthesizer to applications when marker is reached.
JSML - JSpeech Markup Language
Developed by Sun and SpeechWorks, as a syntax for expressing speech grammars Based on the Java Speech Grammar API Grammar Format http://java.sun.com/products/javamedia/speech/
JSML - JSpeech Grammar Format
Speech Application Language Tags

Microsoft, Cisco, Intel, Comverse, SpeechWorks, Philips
A lightweight set of tags designed to be used with HTML and XHTML to enable lightweight telephony applications driven from regular Web documents. y Targeted at supporting multimodal access
y
Microsofts SALT

VXML g9104

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

VXML g9104

Hochgeladen von

Copyright:

Verfügbare Formate

Speech Technologies and VoiceXML

Guided byMr R.P.Ojha

Voice technologies Backgrounds

Pervasive Computing Model

VoiceXML Key Design Issues

W3C Voice Browser Working Group

VoiceXML v1.0 (May 2000)

VoiceXML Platform Architecture

VoiceXML Platform Architecture-2 Architecturey

Document servers(web server)

Example and weather.jsp - VoiceXML

Web server+ Servlet/JSP engine

y In Taiwan: Implementations of VoiceXML Yes Mobile GatewaysTelecom Laboratories ( Chunghwa )

eWings Technologies, Inc

[DEMO] A Simple VoiceXML Application

VoiceXML Document Structure.

form user URL

Variables can be manipulated and referenced

session application document dialog

Session variables are read-only variables provided by the interpreter context

Variable Scope Search for variable name

xHTML + VoiceXML y SALT

Multimodal Web Browsing

[DEMO] Multimodal Browsing

Future of the Voice web and Microsoft-led (2002) VoiceXML

Voice Browser interoperation

Show : VoiceXML in Daily Life

Classification of Voice Basic interactive voice response (IVR)

Basic speech ASR

Advanced speech ASR Classification of

Near-natural language ASR

VoiceXML System Components

VoiceXML servers serve as integrators of various hardware and software

FIA - Form Interpretation Algorithm

<form> ... <filled> <if cond="travellers > 10">

<clear namelist="travellers"/> <elseif cond="travellers > 5 && dest-city == 'London'"/>

<clear namelist=city travellers"/> <else/> <submit next="http://travel.com/order"/> </if> </filled> </form>

if, else and elseif

JSML - JSpeech Markup Language

JSML - JSpeech Grammar Format

Speech Application Language Tags

Das könnte Ihnen auch gefallen