Sie sind auf Seite 1von 6

IEEE Proceedings of 4th International Conference on Intelligent Human Computer Interaction, Kharagpur, India, December 27-29, 2012

VoiceMail Architecture in Desktop and Mobile


Devices for the Blind People
Tirthankar Dasgupta, Aakash Anuj, Manjira Sinha, Ritwika Ghose, Anupam Basu
Indian Institute of Technology Kharagpur
iamtirthankar, aakashanuj.iitkgp, manjira87, ritwika.ghose, anupambas@gmail.com

Abstract—The advancement in computer based accessible especially over a long distance. E-mails also provides a sense
systems has opened up many avenues for the visually impaired of privacy as the access to ones account is restricted and
across a wide majority of the globe. Audio feedback based many other functionalities like create, store and organize all
virtual environment like, the screen readers have helped Blind
people to access internet applications immensely. However, a documents. E-mails are also the gateways to get connected
large section of visually impaired people in different countries to the social networking communities. However, designs of
in particular, the Indian sub-continent could not benefit much most web based e-mail systems like Gmail are such that
from such systems. This was primarily due to the difference in Blind people faces accessibility problems [25]. People with
the technology required for Indian languages compared to those vision problems generally access e-mail systems with the help
corresponding to other popular languages of the world. In this
paper, we describe the VoiceMail system architecture that can be of screen readers like, JAWS1 and NVDA2 and Automatic
used by a Blind person to access e-Mails easily and efficiently. The Speech Recognizers (ASR) [18], because using a conventional
contribution made by this research has enabled the Blind people keyboard approach is extremely problematic as that person has
to send and receive voice based e-Mail messages in their native to memorized the keyboard layout, especially if the keyboard
language with the help of a computer or a mobile device. Our layout is unknown to him/her. A screen reader is an application
proposed system GUI has been evaluated against the GUI of a
traditional mail server. We found that our proposed architecture that attempts to extract the textual information displayed on a
performs much better than that of the existing GUIs. given desktop screen and represent them to the user through an
integrated text to speech engine or a Braille output device. On
I. I NTRODUCTION the other hand, using the ASR application, a Blind person can
The present digital era is witness to a rapid and overwhelm- convert any spoken signal into text. Therefore, a Blind person
ing growth in the interactive product designing sector. Digital can use a screen reader to identify and interpret the information
devices like computers, smartphones, and tablets along with available from the mailing client interface and then use the
internet technology are getting cheaper and more accessible ASR to record a voice message and then convert it into
to the common people. As a result they are no more only text. Finally, this text message is delivered to the recipient’s
technologies,rather, have become a part of our daily lives. address through the mailing GUI. Although these technologies
The inhabitants of this virtual world are across all stratum are being improved continuously, some major problems still
and classes of society; all having their specific needs and an persist which make them unusable as a way of accessing e-
array of choices to fulfill. For a long time, the differently mail to a large segment of Blind people. This crisis can be
abled people were deprived from getting the benefits, but primarily attributed to the following reasons:
now the advent of state of the art assistive technologies has a. With the advancement of digital technologies, now days,
opened up many avenues for these persons. People with vision a large number of user centric applications and features
difficulty or Blindness have been benefited immensely from are being added to the graphical user interfaces of
different computer based systems like automatic text-to-Braille the mailing clients. They are having highly cluttered
transliteration systems [1], [3], [4], [5], [9], [13], [17] and and audio interfaces and poorly constructed alternative texts
audio feedback based virtual environments using automatic for images or graphical navigational tools. But, the
speech recognition (ASR) and text to speech (TTS) converter screen reader technology is being updated at the same
[8], [20], [21], [22]. These systems have enabled Blind people pace. Therefore, it is becoming rather difficult for a
to explore the power of the cutting edge technologies and also Blind person to access the e-mail systems easily and
to communicate effectively with other people with flexibility effectively with the help of a screen reader.
and accessibility as enjoyed by their sighted counterparts. b. The development of ASR systems is still in the nascent
One of the revolutionary electronic technologies in present stage. Under critical circumstances like, in a noisy
day is the electronic mail or e-mail. E-mails have become environment, performance of an ASR may degrade
the primary means of communication and productivity to drastically. Thus, the text message generated by the ASR
almost all groups of people. They offer a quick and easy
sharing of ideas of information and at the same time is 1 http://www.freedomscientific.com/products/fs/jaws-product-page.asp

cheaper than traditional telephone communication methods, 2 http://www.nvda-project.org/

978-1-4673-4369-5/12/$31.00 2012
c IEEE
system may be erroneous. if they are not computer literate.
c. There are around 7000 different languages in the world3 • Since, no ASR or TTS systems has been used for the
and they differ from each other in their structure and purpose, language dependency of the system has reduced
usage. Compared to this vast number of languages, only drastically. Also the system is lightweight and only re-
a handful of them are fortunate to have their own TTS quires the device to have a sound recording and playback
or ASR systems. Both of these technologies are highly facility.
language dependent, so a system developed for one • The option of sending voice messages will not only help
language is not applicable to the other. Moreover, as a Blind person to access email, but it may also help other
languages have their own specific features, technologies sighted people who cannot type texts due to illiteracy or
have to be developed from scratch for each language. whose languages have not been implemented yet to have
This requires a huge amount of time, money and effort. a key-board layout or digital font.
d. Among the available screen reader and ASR technolo- • As our system is open source, any interested party can
gies, many of them have specific platform and config- improvise it to incorporate higher order and complicated
uration requirements. They are also always not open e-mail features.
source or free to access. Therefore, for users who are The paper is organized as follows: section 2 presents the
not so economically sound, these technologies remain related work where we will briefly discuss about the different
inaccessible. attempts that have been taken to allow a Blind person to use
e. Apart from the high popularity of the desktop PCs, web, in particular e-mail applications, section 3 will deal with
the penetration of mobile devices to common people is the detailed architecture for both the desktop and the mobile
growing exponentially. This is irrespective of the age, version of our proposed system, the evaluation of our system
social, economic or educational factors. Therefore it is using the KLM model will be described in section 4 and
becoming more challenging to cater the needs of people finally, in section 5 we will conclude our work and propose
coming from these varying categories. Moreover, the some future directions.
tools and technologies discussed above for the Blind
users are mostly unavailable for the mobile environments II. R ELATED W ORKS
specifically in the local languages. Their is a rich literature on the technological advances
f. Further, these systems are often too heavy for a small in building assistive tools for the visually impaired people.
scale application like e-mail. These includes development of text to Braille systems [1],
Therefore, from the above discussions, we can see that for [3], [4], [5], [9], [13], [17], screen magnifiers [10], [16], [2],
accessing e-mail the use of a full-fledged screen reader system and screen readers[8], [20], [21], [22]. However, as discussed
integrated with local language TTS and ASR technology previously, these systems are highly language dependent and
requires huge resources and cost not only procurement wise presently available only for few languages like English,
but also in terms of computational power. Further, porting such French, Germany, Swedish, Spanish,and Portuguese[8], [20],
type of systems in small devices like mobile phones and tablets [21], [22]. Recently, attempts have been made in order to
adds another level of complications. develop tools and technologies to help Blind people to access
Keeping in view all of these, the aim of our work is to internet technologies. Among the early attempts, voice input
provide the Blind people with a lightweight alternative to and input for surfing [12] was adopted for the Blind people.
access the basic features of an e-mail system bypassing the IBM’s Home Page Reader [23], presents the web page in an
limitations and problems mentioned above. In order to achieve easy-to-use interface, and converts the text-to-speech, having
the goal, we have designed an open source, light weight voice different gender voices for reading texts and links. However,
e-mailing system that can be used by a Blind person to send the disadvantage of this is that the developer has to design a
e-mail through voice recordings. The system allows a Blind complex new interface for the complex graphical web pages
person to record her voice and instead of converting the speech to be browsed and for the screen reader to recognize.
to text, the system directly sends the recorded voice message Attempts have been puts forward a simple browsing solu-
to the recipient’s mail address as an attachment. The system tion, which divides a web page into two dimensions [7]. This
also provides option to access the mail inbox on behalf of greatly simplifies a web page’s structure and makes it easier to
the user and read out a received voice message. Moreover, browse. Another web browser [19] generated a tree structure
our system is also portable to small hand-held devices like from the HTML document through analyzing links. Though
android based touch-phones. This technique has the following this attempted to structure the pages that are linked together
advantages: to enhance navigability, this did not prove very efficient
• The system provides proper character level voice feed-
for surfing. Furthermore, it did not handle issues regarding
backs during key press. navigability and usability of the current page itself. The Webb
• The system provides an intuitive, interactive and an easy
IE browser [15] extracted the text, removed and used alt names
to use GUI that can be easily used by a Blind user even for images, and represented the page as plain text for making it
easy for any screen reader to present to the user. However, even
3 http://en.wikipedia.org/wiki/Minority language this browser did not enhance navigability. Another browser
developed for the visually handicapped people was eGuideDog following subsections we will discuss about each of these
[26] which had an integrated TTS engine. This system applies modules in detail.
some advanced text extraction algorithm to represent the page
IV. VOICE M AILING S YSTEM : D ESKTOP A RCHITECTURE
in a user-friendly manner. However, still it did not meet the
required standards of commercial use. A. User Selection Module
Considering Indian scenario, ShrutiDrishti [24] and Web- The user selection module provides an option to mention
Browser for Blind [11] are the two web browser framework the type of user using the system. There are two options to
that are used by Blind people to access the internet including be chosen, a) Blind user and b) sighted user. The difference
the emails. Both the systems are integrated with Indian lan- between the two types of user modules is that, for Blind users,
guage ASR and TTS systems. However, none of the prototype all operation performed will get an voice based feedback on
systems are commercially available in the market and to the the other hand, a sighted person will receive textual feedbacks.
best of our knowledge neither of them are having the option Moreover, there are options to save a particular users profile
of portability to small devices like mobile phones or PDAs. so that the user does not have to enter the same details again.
III. T HE S YSTEM A RCHITECTURE B. Mailing Options
The mailing option has got the following two modules a)
Fig. 1. Block Diagram of the VoiceMail Architecture Compose Mail and b) Inbox Check. A user can choose any
one option depending upon the task in hand.
• Compose Mail: In the compose mail module, the com-
pose window will open up, giving the user the option
either to type a text or record a voice message. In order
to record a voice message a user can either click on
the “start Recording” button or can press the mouse left
button anywhere on the screen. The GUI of the system
has been designed in such a way that irrespective of the
position of the mouse pointer, the mouse click operation
will be registered and the system will work accordingly.
In order to stop the recording, again the user can either
click on the “stop Recording” button or can release the
mouse left button anywhere on the screen (provided the
recording has been started by pressing the mouse left
click button). Once the recording is over, the system will
ask the user to select the recipients mailing address. This
is done by reading out all the mail ids of the sender
alphabetically. Once the recipient mail id is entered, the
system will prompt the user to send the mail or to cancel
The architecture of our proposed system is depicted in figure
the operation. In order to send the mail the user can
1. The diagram shows the major components of the present
either press the “send mail” button or middle click on
system, which are:
the mouse to send the mail. We will define all the mouse
a. User selection module click operations in details in the following sections.
b. Mailing options: Compose or Check Inbox • Check Inbox: In the Inbox module, the Blind user can
c. Accessibility options: text based messages or Voice based check the incoming voice mails. For convenience, the
messages system will allow the Blind user to select one of these
d. The Interactive GUI framework: An interactive GUI with two options - checking the first 10 mails, or check all the
voice based feedback to key press operations that supports mails sequentially. These number can be easily altered to
a Blind person to access G-Mail efficiently. suit the best possible use.
e. Mouse click based accessibility for the desktop frame- After the use chooses an option, the system starts to read
work. out the email ids of the senders based upon his choice. For
As mentioned earlier, the voice mailing system was built both each email id, the system asks whether the user wants to
for the desktop computers as well as for mobile devices. listen to that voice mail or not, for which the system halts
The system changes some of its configuration based on the for a moment to receive the response. Then it performs
selected devices. In the following subsections we will discuss the corresponding action and advances to the next mail.
working of the proposed system for both desktop as well as
mobile platforms. Both the platform shares the same concept C. Accessing the GUI using Mouse Key Press
of mailing the recorded voice of the sender to the recipient. The GUI of the interface also captures the mouse operations
However, the GUI for both the platforms differs. In the as performed by the user. Instead of searching for the short-cut
keys from the keyboard, a user can send the same keyboard A Sample screenshot of the embedded version of the Voice-
commands by performing different mouse operations. The Mail system is shown in Figure 2(a)-(d). Similar to the
proposed voice mail system generates a mouse hook pro- desktop architecture, the mobile version of the VoiceMail
cess to captures different possible mouse clicks. Each mouse system, provides the user with two options a) compose and b)
click operation is mapped to a certain keyboard operation. check Inbox. However, unlike the desktop interface where the
The mapping rules are customizable and based on the users choices can be selected via mouse click, the mobile interface
requirements, can be changed easily. Some example of the divides the screen into two different sections (see Figure 2(a)).
mapping rules are shown in table I. Operating the system with Touching the upper section of the mobile screen will select the
mouse is particularly helpful to a visually impaired person and “compose mail” option and the touching the bottom section
our experiment has shown operating the system with mouse will select the “Check Inbox” option. Both the options are
operations are more preferred than keyboard operations for separated with a blank at the middle of the screen to avoid
certain users. multiple touches. Choosing the “compose mail” option will
first asks the user to enter the recipients address and then
TABLE I
D IFFERENT M OUSE C LICK O PERATIONS
provides the user with the option to either enter text message
or record a voice message. Similar to the previous screen, here
Mouse Click GUI Operations also the mobile screen is again divided into two segments by
Left, single NOP horizontal separator. Touching the upper half of the screen will
Left, double Compose Mail
Left, triple Cancel Mail enable the user to type a text message whereas touching the
Middle single Send Mail bottom half of the screen will enable the voice message mode.
Middle double NOP After the recording is over, the system will then automatically
Right single Check Inbox
Right double NOP
save the recorded voice and mail it to the recipients address.
Right triple NOP
Mouse Scroll Up Select Next Mail VI. S YSTEM E VALUATION
Mouse Scroll Down Previous Mail
A. Evaluating the Desktop Architecture
Our primary goal is to compute the task execution time
V. VOICE M AILING S YSTEM : M OBILE A RCHITECTURE using the present GUI design. We compute the task execu-
tion time using the Keystroke Level Model (KLM) [14][6].
In order to use the voice mail system for the mobile devices, To accomplish a given task, a user must perform certain
we have designed the GUI of the system in such a way that a keyboard and mouse operations. The KLM identifies these
Blind person has to perform minimum number of operations operations and assigns a timestamp value to each of them.
to send or receive a mail. Mobiles running on Android offer These timestamp values are then added to get the final task
a feature rich platform for deploying embedded applications execution time. The keyboard and mouse operations with
and latest versions of Java Net Beans support the development their estimated time, as computed and published in [6], are
of Android based applications producing executable programs discussed in table II. We define the task of a user as “Sending
to run in an emulator or in an actual mobile device connected
by a USB cable. Thus, we created a version of the same TABLE II
desktop application up and running on an Android based em- K EYBOARD / M OUSE OPERATIONS WITH THEIR ESTIMATED TIME IN KLM
bedded platform. However due to the hardware constraints, the Operations Time (in sec.)
systems deployment to portable devices required redesigning Key press and release (K) 0.28
the entire GUI optimizing it for the reduced screen size in Move the mouse to an object on screen (P) 1.1
a mobile device. Moreover, most of the system’s capabilities Button press or release (mouse), (B) 0.1
Hand from keyboard to mouse or vice versa (H) 0.4
have also been entirely reworked for smooth running in devices Mental Preparation (M) 1.2
with reduced clock speeds as compared to PCs. The GUI was Type string of n characters (T(n)) n*K
designed using the open source Android Java library. Roughly,
the hardware requirements for our Android version of the a message to a person through e-Mail”. The steps required to
application are: accomplish this task using the GUI of a standard Gmail client
1) A touch screen device, preferably of form factor 3.5” x and our proposed VoiceMail system is depicted in Table III
3.5”. and Table IV. One of the key observations we made from
2) Android devices running Android OS version 2.3 or table II is that the mouse pointer movement operation is taking
higher. around 0.9 sec higher time that the mouse click operations.
3) CPU speed ≥ 400 MHz, Hence, we tried to enhance our present method of Mouse
4) At least 50 MB of free phone memory, with support for Movement and Selection so that we can reduce the number of
SD card installation. mouse movement operations. Consequently, we came up with
5) At least 100 MB of secondary storage. a solution by which a user can perform most of the operations
6) Option to connect to the internet. on the VoiceMail system with the help of a three button mouse.
TABLE IV
ACTION S EQUENCE FOR “M OUSE M OVEMENT AND S ELECTION ”
M ETHOD FOR S TANDARD G MAIL GUI

Action Performed Time (in sec.)


Press ’c’ to compose 1*0.28=0.28
Type recipient email address 0.28*10=2.8
Type a message (T(n)) 0.28*50=14
Move mouse over the ’send’ button 1*1.1=1.1
Click on the ’send’ button 2*0.1=0.2
Estimated Time(ETGmail ) 4.38
Total Time(TGmail ) ETGmail + T (n) = 4.38 + T (n)

time taken to speak a particular character or word. Neverthe-


less we will assume that δ < T (n). Thus, TV mail ≤ TGmail .
Therefore, we finally conclude that accessing the Gmail client
using the VoiceMail system considerably reduces the overall
(a) (b)
execution time of a Blind user as compared to the standard
GUI of the Gmail client.

VII. C ONCLUSION AND F UTURE W ORKS


The VoiceMail system architecture, presented in this paper,
is an attempt to bridge the gap between the Blind population
to access essential electronic communication modes like e-
Mail. We present both desktop as well as mobile based
architecture for the same. The system allows a Blind person
to send voice based e-Mails messages. This will reduce the
extensive cognitive load taken by a Blind to remember and
type characters using a keyboard or a mobile keypad. Further,
as messages are sent via voice, it eliminate the lack of English
language proficiency of a Blind person. We have evaluated
our proposed architecture by comparing the performance of
(c) (d)
our proposed GUI with that of the existing Gmail GUI. Our
preliminary result shows that, for a Blind person, the GUI of
Fig. 2. The screen shots of the mobile based VoiceMail system. the proposed VoiceMail system performs much better than that
of the existing GMail GUI.
TABLE III However, there are several limitations of the present system
ACTION S EQUENCE FOR “M OUSE M OVEMENT AND S ELECTION ” that we intend to address in the next phase of our work.
M ETHOD FOR VOICE M AIL GUI
Although composing a mail through voice is easy, however, in
Action Performed Time (in sec.) the present version of the system a Blind person has to enter
left click to compose 2*0.1=0.2 the recipient’s mail address through keyboard. This ultimately
type recipient address 0.28*10=2.8
middle click start record 1*0.1=0.1
requires the knowledge of the keyboard. We can partially solve
record message δ this problem by integrating a separate configuration GUI with
middle click stop record 1*0.1=0.1 the present system. Through the configuration GUI, a Blind
right click send message 2*0.1=0.2 person can associate a voice based nick name corresponding
Estimated Time(ETV mail ) 3.4
Total Time(TV mail ) ETV mail + δ = 3.4 + δ to each recipient mail id and during composing a mail, the
mail ids of the recipients can be searched through the list of
these voice based nick names.
The various mouse click operations and their corresponding
R EFERENCES
GUI commands are explained in table I of section IV-C.
We try to perform the same task of “Sending a message [1] B. Anupam, S. Roy, P. Dutta, and S. Banerjee. A pc based multi-user
to a person through e-Mail”. This task has been performed braille. reading system for the blind libraries. IEEE Transactions on
Rehabilitation Engineering, 6(1):60–68, 1998.
using the Mouse Click Operations method as explained above. [2] A. Baude, P. Blenkhorn, and G. Evans. The architecture of a windows
Table III presents the different actions and the time required to 9x full screen magnifier using ddi hooking. Assistive Technology–Added
accomplish the given transliteration task. It is to be noted that, Value to the Quality of Life (AAATE 2001), IOS Press, Amsterdam, pages
113–118, 2001.
in table III the time required to record a voice message is kept [3] P. Blenkhorn. A system for converting braille into print. Rehabilitation
a variable δ. This is because KLM model does not account for Engineering, IEEE Transactions on, 3(2):215–221, 1995.
[4] M. Blomquist and P. Burman. The winbraille approach to producing
braille quickly and effectively. Computers Helping People with Special
Needs, pages 275–285, 2002.
[5] brailler. Retrieved from www.brailler.com.
[6] S.K. Card, T.P. Moran, and A. Newell. The keystroke-level model for
user performance time with interactive systems. Communications of the
ACM, 23(7):396–410, 1980.
[7] C.N. Chu. Two dimension interactive voice browser for the visually
impaired. Computers Helping People with Special Needs, pages 623–
623, 2004.
[8] Dolphin computer access. Retrieved from
www.dolphinuk.co.uk/products/hal.htm.
[9] T. Dasgupta and A. Basu. A speech enabled indian language text
to braille transliteration system. In Information and Communication
Technologies and Development (ICTD), 2009 International Conference
on, pages 201–211. IEEE, 2009.
[10] D.G. Davis. Computer screen magnifier, September 25 1990. US Patent
4,958,907.
[11] R. Ghose, T. Dasgupta, and A. Basu. Architecture of a web browser
for visually handicapped people. In Students’ Technology Symposium
(TechSym), 2010 IEEE, pages 325 –329, april 2010.
[12] C. Hemphill and P. Thrift. Surfing the web. DROIDS Made Simple,
pages 239–256, 2011.
[13] N. Kalra, T. Lauwers, D. Dewey, T. Stepleton, and M.B. Dias. Iterative
design of a braille writing tutor to combat illiteracy. In Information
and Communication Technologies and Development, 2007. ICTD 2007.
International Conference on, pages 1–9. IEEE, 2007.
[14] D. Kieras. Using the keystroke-level model to estimate execution times.
University of Michigan, 2001.
[15] A. King, G. Evans, and P. Blenkhorn. Webbie: a web browser for visu-
ally impaired people. In Proceedings of the 2nd Cambridge Workshop
on Universal Access and Assistive Technology, Springer-Verlag, London,
UK, pages 35–44. Citeseer, 2004.
[16] S. Kurniawan, A. King, D.G. Evans, and P. Blenkhorn. Design and user
evaluation of a joystick-operated full-screen magnifier. In Proceedings of
the SIGCHI conference on Human factors in computing systems, pages
25–32. ACM, 2003.
[17] A. Lahiri, S.J. Chattopadhyay, and A. Basu. Sparsha: A comprehensive
indian language toolset for the blind. In Proceedings of the 7th inter-
national ACM SIGACCESS conference on Computers and accessibility,
pages 114–120. ACM, 2005.
[18] K.F. Lee, H.W. Hon, and R. Reddy. An overview of the sphinx speech
recognition system. Acoustics, Speech and Signal Processing, IEEE
Transactions on, 38(1):35–45, 1990.
[19] Z. Liang, X. Song, Z. Zhu, and R. Liu. Design and implementation of a
new browser for blind person. Computer Engineering and Applications,
14:106–108, 2004.
[20] monty. visual aid. www.visuaide.com/monty.html.
[21] C. Pennington and K. McCoy. Providing intelligent language feedback
for augmentative communication users. Assistive Technology and Arti-
ficial Intelligence, pages 59–72, 1998.
[22] TV Raman. Emacspeaka speech interface. In Proceedings of the SIGCHI
conference on Human factors in computing systems: common ground,
pages 66–71. ACM, 1996.
[23] synapse homepagereader. synapse adaptive homepage reader,
http://www.synapseadaptive.com/wynn/ibm home page reader.htm.
[24] P. Verma, R. Singh, A.K. Singh, V. Yadav, and A. Pandey. An enhanced
speech-based internet browsing system for visually challenged. In
Computer and Communication Technology (ICCCT), 2010 International
Conference on, pages 724–730. IEEE, 2010.
[25] B. Wentz and J. Lazar. Email accessibility and social networking. Online
Communities and Social Computing, pages 134–140, 2009.
[26] J. Xiao, G.N. Huang, and Y. Tang. An open source web browser
for visually impaired. Advanced Intelligent Computing Theories and
Applications. With Aspects of Theoretical and Methodological Issues,
pages 90–101, 2007.

Das könnte Ihnen auch gefallen