Sie sind auf Seite 1von 7

ENHANCING A VOICE-ENABLED WEB BROWSER FOR

THE VISUALLY IMPAIRED

Atiwong Suchato*, Jirasak Chirathivat, Proadpran Punyabukkana

Spoken Language Systems Research Group,


Department of Computer Engineering, Chulalongkorn University,
Phyathai Rd., Pathumwan, 10330 Bangkok
Thailand

ABSTRACT

A vast body of digital content lies in the World Wide Web, to which people can
access via web browsers. Automatic speech recognition as well as Text-to-
Speech (TTS) components have been incorporated into some web browsers
aiming to assist the visually impaired access web content. Various webpage-
reader programs usually read out textual content sequentially as appeared on the
webpages. Many times, presenting a webpage’s content aurally in such a
sequential fashion does not give visually-impaired users a clear picture of the
content structure of that webpage. Furthermore, it is not uncommon that
webpage-reader programs read texts that are unrelated to the main content. This
paper proposes a hierarchical webpage content representation, which allows
webpage content to be stored in a tree structure, and an associated XML that
defines the parsing rules for parsing webpage content into the hierarchical
representation. The parser was implemented in a Thai voice-enabled web
browser with multilingual TTS capability, named CUVoiceBrowser, so that the
targeted webpage content is parsed into the hierarchical representation before
being read accordingly to the users. The browser’s command list was extended
to support navigation through the hierarchical webpage content representation.
A demonstration of the concept was shown by its application on the front page
of a newspaper website.

KEYWORDS:

Assistive Technologies for Person with Disabilities, Speech Recognition Application

* Corresponding author. Tel: 66-2-218-6956 Fax: 66-2-218-6955 E-mail: Atiwong.S@chula.ac.th


1. INTRODUCTION

With advancements of the internet software and high-speed connection, it is undeniable that
surfing the web has become an integral part of many people’s lives. While the web seems to
be an ideal source of information and service for many people, it is currently far from ideal for
the blinds and people with low vision. It is clear that visually-impaired population usually has
some difficulties using the web, let alone reaping the web’s full benefits appreciated by people
with normal sight. An obvious source of difficulties for the visually impaired to surf the web
is due to the fact that most web content is presented visually, i.e. in the forms of texts and
pictures, on web browsers. Digital contents in other formats with sounds included may be
available on some web sites but very scarcely. More importantly, such contents are not
intended to solely represent the overall content of a webpage. Thus, they will never render
textual content obsolete, at least in the foreseeable future.
Another source of difficulty concerns methods used for receiving user’s input deployed in
the user’s interfaces (UI) of currently available web browsers. Web browsers’ UIs are usually
designed to work in accordance with the most prominently used devices, which are keyboards
and mice. Unlike motor-handicapped people, the visually impaired usually have no problem
learning to use keyboards and mice. The difficulties lie in that they do not see the positions of
their mouse pointers or text cursors. When surfing the web, navigation via hyperlinks, as well
as locating elements of web forms such as input textboxes, seem almost impossible without
appropriate assistive technologies.
Technologies have been used to ease such difficulties. Some libraries around the world
deploy assistive technologies heavily in their establishments with an aim that any individual
must be able to access libraries resources equally regardless of their sights [1]. Many assistive
technologies are based on Braille characters. [2]. However, speech technology is a preferred
choice for many researchers and developers [3]. One reason is that speech communication is
natural. Also, data rate via speech communication is higher than typing words on an ordinary
or Braille keyboard. Furthermore, efforts spent in improving speech technology are paid back
not only to the visually-impaired community but also to the more general society. Common
components utilizing speech technology for assisting the visually impaired accessing web-
based information include webpage-reader programs, automatic speech recognition systems,
and voice-enabled web browsers.
A webpage-reader program is a computer program that reads texts on a webpage. It utilizes
a Text-to-Speech (TTS) module, which is able to generate speech sounds from corresponding
texts. Note that the most immediate goal of a webpage-reader for the visually impaired is not
how to make the program read texts more naturally but how to read them with high
intelligibility and appropriately. Automatic Speech Recognition (ASR) software allows
probabilistic mapping from sounds to texts [4]. The role of an automatic speech recognizer in
assisting the visually impaired is to provide an alternative mean of controlling and inputting
data into voiced-enabled computer programs. A voice-enabled web browser is a web browser
program that can be operated using voice commands, in addition to normal keyboard and
mouse operations. It usually has a webpage-reader program integrated, so that users can surf
the web based purely on voice, if they would like or have to. Many researchers have
successfully developed prototypes to demonstrate the concept of web surfing via voice in
various languages. Hemphill et al. has developed voice-enabled navigation via a speakable
hotlist, speakable links and smart pages using a speaker-independent speech recognizer [5].
Brondsted et al. has built a Danish voice-enabled web browser for motor-handicapped users
[6]. And, Punyabukkana et al. has demonstrated such capability in Thai [7].
2. MATERIALS AND METHODS

3.1 Hierarchical Structured Content

A webpage-reader program usually reads webpage content in order of appearance. This way,
the information is presented sequentially regardless of the organization of the real content.
Users with normal sight can visually capture the structure of the webpage content via its
formatting and be able to make a mental model of its organization. However, with the absence
of visual information, the visually impaired are easily lost after the program reads the
webpage content for a while. Consider an illustration of a webpage content structure in Fig.1.
This webpage contains an article on volcanoes1. The sizes of the fonts, the bold faces, and the
use of underlines in this illustration resemble the formatting used in the original webpage.
Such formatting naturally helps the reader understand the organization of the content. If this
content is read by a webpage-reader sequentially from top to bottom without presenting any
cues to the topic hierarchy, visually-impaired users could easily have troubles understanding
how these topics are organized.

Volcano
Volcano classification
Erupted material
Lava composition
… (content) …
Lava texture
… (content) …
Shape
Shield volcanoes
… (content) …
Cinder cones
… (content) …
Stratovolcanoes
… (content) …
Supervolcanoes
… (content) …
Submarine volcanoes
… (content) …
Subglacial volcanoes
… (content) …
Classifying volcanic activity
… (content) …
Notable volcanoes
Volcanoes on Earth

Fig. 1. A webpage and the illustration showing its content organization

root Article Topic


st
1 level topicName
topic <content 1>
<content 2>
2nd level
contentList :
topic

3rd level
topic
:
Fig. 2. Organization of the hierarchical Fig. 3. Illustration of a Topic object
structured content
A convenient way to preserve the organization of topics on a webpage is to arrange those
topics into a hierarchical structure, in a similar fashion to the illustration in Fig.2, and let a

1 http://en.wikipedia.org/wiki/Volcano retrieved June 24th, 2006


webpage-reader work with this arranged content instead of the original one. Each circle in
Fig.2 represents a Topic object, a data storage for an individual user-defined topic. The Topic
object has two properties, namely topicName and contentList. The property topicName stores
a text string chosen to be the name of that Topic object, while the property contentList stores a
collection of text excerpts associated with the Topic object. To make this more specific to the
use of webpage-reader programs, we can say that this collection of text excerpts is to be read
under the topic associated with its Topic object. From the hierarchy, if Topic B is a child of
Topic A. we call that Topic A is a subtopic of Topic B. Any Topic objects on the same level
and with the same parent are called siblings. Every Topic objects must belong to an Article.
Adding navigation commands based on this hierarchical structured content to voice-
enabled web browsers will let the visually impaired browse through the content more
efficiently. The implementation of such commands in a voiced-enable browser is described in
later section. The next section describes how one can define the mapping from the content of a
webpage into the above hierarchical structure.

3.2 Obtaining Hierarchical Structured Content Using XML-based Template

Hierarchical structured content can be used as a method to prepare webpage content for a
webpage-reader program. It preserves the organization of the webpage content when
formatting does not work, as in the case of visually-impaired users. Apart from the ability to
arrange webpage content in a meaningful organization, it is also desirable to control which
text elements are to be read or not to be read. This control over what to be read is also another
benefit of the mapping from normal webpage content to the hierarchical structured one. In
many webpage-reader programs, it is possible for users to set parameters in the programs to
select what types of elements on the webpage should be read and which should be opted out.
This parameter setting could be done on a page-by-page basis or apply globally to every
webpage. However, the downside of this is that selecting what to be read or not to be read is
done by visitors to that webpage, when it makes much sense to have the author of the visited
webpage do the preparation. If the author of a webpage would like to have a control over how
a specific webpage-reader program read the content on his/her webpage, he/she might be able
to do that partially by offering a suggested set of parameters for visitors to set their webpage-
reader programs. Still, this is not very convenient, especially in the case of visually-impaired
visitors.
Here, we propose the use of XML-based parsing templates that describe the parsing rules
defining how content of a webpage should be mapped into the hierarchical structure. A
program called “Heirarchical Structured Content Parser (HSC parser)”, which was
implemented as a part of this work, is used for applying the parsing rules define in a parsing
template to the webpage of interest and creating the associated hierarchical structured content,
which is also stored in an XML format. Since both the parsing template and the resulting
hierarchical structured content are in plain text, it should be simple to create these parsing
templates in a normal text editor and make use of the parsed content. Fig. 4 illustrates the
hierarchical structured content parsing process. Parsing rules are defined in parsing templates
using XML. The parsing rules identify how Topic objects are created, as well as how their
properties are filled.
Hierarchical structured
Parsing template <XML> content

< > HSC


Webpage source parser

Fig. 4. Obtaining hierarchical structured content of a webpage content from its source code
using HSC parser and the corresponding parsing template
3. RESULTS AND DISCUSSION

3.1 Implementation in CUVoiceBrowser

CUVoiceBrowser is a voice-enabled web browser, integrated with Thai automatic speech


recognition and multilingual TTS modules. Both the visually impaired and people with normal
sight are taken into consideration for the design of CUVoiceBrowser. Navigation can be done
via both traditional and voice inputs. Together with the webpage reading capability of the
TTS, voice commands allow users to perform most of the tasks required for accessing
mainstream contents on the web, including going to the desired URLs, following links shown
on the webpage, asking for the list of links on the webpage, opening the user’s bookmark
page, navigating forward and backward, activating to the user’s pre-defined search procedure,
filling in web forms and performing search using character-wise data entry, controlling the
text reading of the TTS module, requesting instructions from its help page, and perform
simple program controls. CUVoiceBrowser automatically extracts webpage content from its
surrounding formatting tags. By default, the reading is performed sequentially from the top to
the bottom of the page. Different voices are used to distinguish between normal and
hyperlinked texts in order to help visually-impaired users identify links they can follow.
The HSC parser was integrated into CUVoiceBrowser in order for the browser to support
the hierarchical structured content parsing. CUVoiceBrowser was also modified so that when
it retrieves a webpage from a URL, the browser looks for appropriate parsing templates.
Parsing templates can be found in two ways. First, the author of a webpage can provide the
URL of the parsing template designed for that webpage in the head section of the webpage’s
source. The other way for parsing templates to be found is from the browser’s pre-loaded
parsing template inventory. When the browser engine receives a webpage via http response
and the parsing template associated with that webpage is presented, the source of that
webpage is fed to the HSC parser together with its parsing template. Then, the hierarchical
structure for that webpage content is created and stored in the browser’s memory. Users can
navigate through the hierarchical structure and have the TTS read the content or the name of
the desired topic, as well as the list of topics at any levels of the structure by using an
extended set of voice command. Original voice commands can still be used and they will be
applied to the content of the topic that the browser is currently in.

3.2 An Example Usage

In order to illustrate its usage, we have applied our approach to organizing webpage content
on some frequently-updated webpages. Here, we present a demonstration of the approach on
the front page of The New York Times (http://www.nytimes.com). The format of such
webpage was analyzed, the targeted hierarchical structure of the desired content on that
webpage was defined, and the corresponding parsing template was constructed and used. Fig.
5 shows a portion of the webpage where the breaking news (A, B) are listed. It is easy to see
that a typical webpage-reader will not do a good job narrating the content in this portion. In
contrary, we can write a parsing template that looks for news headlines in section A and
section B by searching for specific formatting tags (in this case, <div>) and make each one of
them, together with its corresponding content, a Topic object. These Topic objects will be
listed as a subtopic of another Topic object names “Breaking News”, and their topic names are
extracted from the <h3> elements (for the first news) and the <h5> elements (for the other
news) in the vicinity of each Topic object. The part of the parsing template that defines how to
create Topic objects in section A is shown in
Fig. 6. Note that some details of the XML are neglected in the figure. Once the webpage has
been parsed into the hierarchical structured content, CUVoiceBrowser allows user to navigate
through the parsed content.

6. CONCLUSIONS

We proposed a method to arrange webpage content into a hierarchical structure using XML-
based parsing templates. When webpage content is arranged in such a structure, it is easy for a
webpage-reader program that supports the structured content to read the content to the users in
a more organized and controllable manner. This will help the visual impaired navigate through
the content of a webpage in a content-driven fashion. Such an approach is our next step to
help the visually impaired accessing web-based information beyond the use of a voice-enabled
web browser with a traditional webpage-reader module.
Although the parsing templates are flexible and small in size due to their text-based nature,
it might still be too complicated for a novice to markup languages. A graphical user interface
could be developed so that one can define Topic objects via simple mouse actions on the
webpage itself rather than tags in its source code. Another interesting aspect that would make
the read webpage content as informative as its original content with visual formatting is to use
various aspects of voice quality, together with other sounds, to communicate the intention of
each visual formatting element. For example, a weak beep sound could be played in the
interval when a hyperlink is read, or a higher-pitched voice could be used to read blinking
texts. Psychological studies are needed for such sound representations.

Fig. 5. A portion of a front page of The New York Times website


<topic amount="1"> Definition of a Topic object
<begin>…</begin> named “Breaking News”
<end>…</end>
<body/>
<topicName>Breaking News</topicName>
<contentList/>
<topicList>
<topic amount="1">
<begin>…</begin> Definition of a Topic object
<end>…</end> named after the <h3> element
<body/>
<topicName>
<locator><tagname>h3</tagname></locator>
</topicName>
<contentList>…</contentList>
</topic>
<topic amount="*">
<begin>…</begin> Definition of an unspecified
number of Topic objects named
<end>…</end> after <h5> elements
<body/>
<seperator>…</seperator>
<topicName>
<locator><Tagname>h5</Tagname></locator>
</topicName>
<contentList>…</ParagraphList>
</topic>
</topicList>
</topic>
Fig. 6. Portion of the parsing template used to parse the content in section A

REFERENCES

[1] Lee, Young Sook, 2005: The Impact of ICT on Library Services for the Visually
Impaired: 8th International Conference on Asian Digital Libraries (ICADL2005).
Bangkok, Thailand. 44-51.
[2] Zagler, W. L., Mayer, P., 1992: Microprocessor Devices to Lower the Barriers for the
Blind and Visually impaired: Journal of Microcomputer Applications 15(1): 57-64.
[3] Nolan, Y.M., de Paor, A., 2005: Phoneme Recognition Based Software System for
Computer Interaction by Disabled People: The Int. Conf. on Computer as a Tool,
2005 (EUROCON 2005) Vol.1. 394 – 397.
[4] Rabiner, L.R., 1989: A Tutorial on Hidden Markov Models and Selected
Applications in Speech Recognition: Proc. of the IEEE Vol.77(2) 257-286
[5] Hemphill, C.T., Thrift, P.R., 1995: Surfing the Web by Voice: ACM Multimedia 95-
Electronic Proceedings, San Francisco, CA, USA.
[6] Brondsted, T., Aaskoven, E., 2005: Voice-Controlled Internet Browsing for Motor-
handicapped Users, Design and Implementation Issues: Interspeech 2005. 9th
European Conference on Speech Communication and Technology (Interspeech2005).
Lisboa, Portugal.
[7] Punyabukkana, P., Chirathivat, J., Maekwongtrakarn, J., Chanma, C., Suchato, A.,
2005: The Implementation of CUVoiceBrowser, a Voice Web Navigation Tool for
the disabled Thais: Unpublished paper.

Das könnte Ihnen auch gefallen