Sie sind auf Seite 1von 14

Application of Word Prediction and Disambiguation to Improve Text

Entry for People with Physical Disabilities (Assistive Technology)

Hisham Al-Mubaid Ping Chen


University of Houston-Clear Lake University of Houston-Downtown
Houston, Texas, 77058 USA One Main Street
Email: hisham@uhcl.edu Houston, TX 77002

Keywords: Human computer interaction, assistive technologies for disability, word prediction
applications.

Biographical notes: - H. Al-Mubaid received his PhD in Computer Science from University of Texas
at Dallas in 2000, and then worked at the State University of New York (SUNY) at Geneseo for one
year. In August 2001, he joined University of Houston-Clear Lake (UHCL) as he is currently an
associate professor of Computer Science. Besides his main research interests in natural language
processing and bioinformatics, he has interests in teaching- learning and educational based research.
He is an active member of UHCL Teaching Learning Enhancement Center (T-LEC). His research and
publications have been primarily centered around natural language processing, bioinformatics, text
mining and data mining.
- Dr. Ping Chen is an Associate Professor in Department of Computer and Mathematics of University
of Houston-Downtown. His research interests include computational semantics and medical text
mining. Dr. Chen received his BS degree on Information Science and Technology from Xi'an Jiao
Tong University, MS degree in computer science from Chinese Academy of Sciences, and Ph.D
degree on Information Technology from George Mason University.

1. Introduction
Computers and computer-based technologies are becoming an integral and important part of lives for
most people, including those with disabilities []. Improving and enhancing textual information entry for
disabled users has been investigated for many years, with many approaches, systems, and interface devices
proposed and implemented to facilitate and simplify text input for such people.
This paper describes our project that targets people with physical disabilities and motion impairments
like cerebral palsy, muscular dystrophy, spinal injuries, and other muscular deficiencies. One of the main
difficulties faced by such people in interacting with computers is that their text entry is very slow, and the
typing process can be tiring. This problem has been addressed in two ways: (a) by improving the design of
the input interface, and usually, introducing new devices (head-stick, head-pointer, eye-tracker, stylus touch
pad) [, , , , ]. (b) by reducing the input keystrokes needed to produce a given text, using the existing regular
input devices [, ].
In this paper, we describe how to improve textual information entry, through word prediction, as an
assistive technology for people with motion impairment using the regular input device, the keyboard, to
eliminate the overhead and the cognitive load needed for the learning and the familiarizing process associated
with the new and specialized input devices. Since the keyboard is the de facto universal interface for most
general computing devices, it is more appealing to invent new techniques to allow faster and easier ways to
interact with the keyboard for all users. Moreover, inputting text using regular keyboard, without the need of

1
any special hardware or devices, enables such users to continue to use the everyday off-the-shelf computer
applications and programs, like email programs and word processors.
In word prediction, the system predicts, at each keystroke, the most likely string being typed, then
allowing the user to merely verify the prediction by a single keystroke; so that, a number of keystrokes can
be eliminated. The drawback of this method is the cognitive load associated with reviewing and selecting the
best one of multiple suggestions for completion. Minimizing cognitive load is extremely important when
designing user interfaces and tools for disabled users [, ]. Research has shown that cognitive processing and
cognitive cycles are, usually, slower in motion-impaired users. For example, the motion-impaired users need
an additional delay to plan and control their physical movements []. Another cognitive concern is that some
people cannot handle decisions to select from a list of words, and they can easily get confused with
inappropriate suggestions [], or they may even be unable to retain their desired text when offered a list to
choose from [].
A robust word prediction system can benefit regular users, as well, by reducing the number of keystrokes
and minimizing typographical errors and misspellings, which has become more common these days. This
aspect has been observed by numerous software and hardware devices, such as Pocket PC device iPaq and
open source word processor OpenOffice, which provide, along with standard word processing features, word
auto-completion and auto-correction []. Nevertheless, the quality of predicted words is crucial to save the
number of keystrokes since too many or poor predictions will distract users and increase their cognitive load
instead. For the disabled users, the aspect of reducing typographical errors and misspelling makes them more
comprehensible, thus increasing their contribution and productivity. For people with reduced cognitive skills
or learning disabilities, word prediction can assist them in constructing well-formed texts and thus can
enhance their English literacy and writing skills []. In one research study about benefits of word prediction [],
it was shown that word prediction assisted students with impaired language skills in entering correct word
forms along with enhancing the their overall word fluency. The study also showed that the students used
larger vocabulary and produced more text. Furthermore, word prediction can benefit other Natural Language
Processing applications, such as context-sensitive spelling correction, part-of-speech tagging, and word sense
disambiguation [].

1.1 Word Prediction and Completion


We address the problem of word completion from a new perspective, that is, integrating supervised and
adaptive learning to enhance text entry for physically disabled users with minimized cognitive load. The
word prediction system offers, after each keystroke, a number of suggestions allowing the user to select one
or continue the typing. The process of browsing and reading the suggested words (suggestions) imposes an
extra cognitive load on the user, and the larger the number of suggestions, the more the cognitive load
imposed on the user. This research focuses on minimizing the cognitive load by offering, in most cases, only
one suggestion, but no more than three suggestions in any case.
The main contribution of this work would be integrating supervised learning with adaptive learning to
produce robust learning techniques that are capable of performing word prediction with high accuracy to
achieve maximum keystroke saving with minimum cognitive load. The value of this project can be viewed in
three aspects; the first one is its universality as it uses the keyboard that is the universal text entry device to
computers, thus no training needed and no new devices introduced. The second aspect is its focus on
reducing cognitive load by offering, in most cases, only one suggestion for word prediction. The third aspect
is the positive impact of machine learning integration on other fields like Human Computer Interaction
(HCI).
The proposed prediction techniques are based on machine learning methods. In natural language
processing research, most of the efficient and robust systems for processing natural language are designed
and developed based on learning and training approaches []. Such learning approaches typically produce
some knowledge bases or trained models that can subsequently be employed in the underlying natural
language processing task. These supervised learning approaches can be distinguished from adaptive learning
approaches where learning modules learn in the particular user/domain as the system is being used to
produce user/domain-specific knowledge. The following steps summarize our proposed method:
(1) Developing machine learning algorithms that focus on the syntactic information in the texts.
(2) Developing adaptive learning methods focusing on the particular user/domain-specific information.

2
(3) Integrating the two learning mechanisms of (1) and (2) into a robust learning model that can easily be
applicable to other tasks similar to the one described in (4).
(4) Applying the model of (3) to the task of word prediction and completion, with the goal of generating
very small number of suggestions. This task will offer a faster and more efficient way of interaction and
communication with the computer with not much extra cognitive load.
(5) The devised system will be available for physically disabled users to test and evaluate the efficiency and
effectiveness of the approach.

2. Previous and Related Work on Word Prediction and Completion


Word prediction is a typical natural language processing task in which we need to determine or predict
the correct or most suitable word in a given context. The context is usually the already typed characters
(prefix) of a word and the preceding words. Word prediction task can be employed in many applications, for
example, word completion utilities and writing aids [, , ]. Statistical and similarity based approaches have
done quite well in tackling this problem, just well as other similar problems such as word sense
disambiguation. A common approach to handle such disambiguation-like problems is to train and use word
bigram or n-gram models.
The task of helping people to interact with computers has been investigated for long time and many tools,
techniques, and devices have been proposed to help users with special needs (physical deficiencies) to
successfully use computers. Several research projects (e.g. [] and []) have been working on developing tools
and solutions to assist computer utilization for disabled users and to provide them with an effective and
efficient means to interact with computers which eventually will lead to broaden the participation of disabled
and elderly people in the fast growing information technology world []. In general, the problem of facilitating
computer interaction has been addressed as follows:
(a) Improving the design of the input interface, and usually, introducing new devices (head-stick, head-
pointer, eye-tracker, stylus touch pad) [, , , , ]. New devices result in extra hardware cost and
usually require special training, which often hurt its adoption by disabled users.
(b) Reducing the input keystrokes needed to produce a given text using the existing regular input devices
[ ]. The approaches in this class can be classified in the following:
• Using abbreviation and compression, that is, allowing the user to input an abbreviated or
compressed form of the text, for example dropping all vowels in entering text [, ]. The drawback
of this method is the large cognitive load associated with learning the abbreviation and compression
rules [, ].
• Using semantic primitives and icons. In this method, semantic concepts are represented as icons,
and each icon has multiple meanings depending on its context, the user can then construct a
sentence by combining a number of icons [, ]. Like the first one, this method involves a
considerable cognitive load to learn and memorize icon sequences to form useful sentences; also the
users are limited to a fixed set of icons and sequences.
• Using word prediction and completion. This is one of the most viable alternatives in this category [,
, , , ]. Some researchers found that the layout and order of the keys in the keyboard are very crucial
to the performance of the disabled people on the computer. The keyboard layout has great effect on
the access time and overall typing rate for disabled people []. There are a number of techniques and
devices for text entry developed to decrease the number of keystrokes. Most of these techniques are
used for mobile phones. For example, Letterwise [] is such a technique developed to enter text
using a phone keypad. This technique is used to decrease the total number of keystrokes needed to
enter a text by using the conventional text entry methods using a 12-key keypad with three to four
letters grouped on each key resulting in ambiguity. In fact, word prediction is very popular in
handheld devices that provide full text features and have very small keypad [, ].
The methods for word prediction can be classified as statistical methods that are based on statistical and
probabilistic language models and syntactic methods in which syntactic information are extracted and
exploited. Most of the existing methods employ the statistical language models using word n-grams and
part-of-speech (POS) tags. Generally, syntactic and lexical based language models have not been as

3
beneficial for language prediction or disambiguation tasks as statistical language models. Word prediction
work has been heavily dependable of statistical language modeling [, ]. In fact, almost all of the previous
work on predictive text entry use statistically driven knowledge and language models []. Machine learning
has not been extensively investigated thus far. Although statistical language techniques can be robust in
computing the suggestions in word prediction, Machine learning can assist in (re-)ranking and reducing the
number of suggestions.
The predictive text entry approach presented in [] is one of the very few projects that are not based on
statistical language model. Instead, this method uses a fairly large knowledge base (semantic network), called
Open Mind CommonSense OMCSNet, of common sense sentences entered by users across the web. When
evaluated and compared with the similar statistical based predictive techniques, this method produced
competitive results []. The main pitfall of this method is its reliance on a huge semantic network that requires
multi-years and thousands of people to contribute to it to be complete. Furthermore, semantic language
models have performance issues in content versus function words, and in polysemous versus non-polysemous
words. Reactive Keyboard, developed by Darragh et al. (1990), is one of the earliest predictive text entry
systems used as writing aid []. Reactive Keyboard uses statistical language models to predict (and complete)
the text being typed by a user based on (statistically driven) text compression methods. A statistical word
prediction system, called Profet, was developed and used as a writing aid in Sweden []. When a user is typing
with Profet a list of up to 9 suggested words is presented to the user who can either accept a suggestion if the
intended word appears in the list of continue typing. The main drawback with this technique is the large
number of suggested words that incurs huge cognitive load on the users.
A review of prior related work in word prediction is presented by Fazly []. Fazly also presents a
collection of experiments on word prediction applied to word completion utilities. The implemented and
evaluated algorithms were word unigrams and bigrams, syntactic predictors using part-of-speech tags only,
and tags and words and combination. The training and testing were done on texts taken from the British
National Corpus, and testing text sizes were around one million words. Roughly speaking, tags-and-words
predictors achieved the best overall performance with hit rate approaching 37% and keystroke savings
around 53% (hit rate is defined to be the percentage of the times that the correct word appears in the
prediction list). The statistical relations between words and word sequences have been exploited to improve
word prediction performance []. That study showed that prediction performance is limited and very difficult
to be improved after certain limit using statistical techniques. Among the other related interesting work is the
approach presented in []. It attempts to learn the contexts in which a word tends to appear, using expressive
and rich set of features. The features are introduced in a language as information sources. The approach then
attempts to augment local context information by global sentence information. The method was tested by
several experiments with good results. About one million words were used for training and testing texts
taken from WSJ corpus. In summary, most of the existing techniques for word prediction are not designed
with the goal of reducing cognitive load through minimizing number of suggestions, and machine learning
has not been extensively investigated in.

3. The Proposed Method


Machine learning approaches, as indicated in the literature, have shown significant robustness and
efficiency in natural language processing []. In this research, two classes of learning methods are
investigated, designed, and implemented: supervised learning methods and the adaptive learning methods.
Then, these two classes of methods will be integrated into a comprehensive learning architecture that will be
capable of acquiring reliable and relevant knowledge automatically and efficiently. We will apply the learned
knowledge to the problem of word prediction for text entry (predictive text entry). The key objective is to
allow the system to learn from prior (training) texts (supervised learning), and from the user (adaptive
learning), so it can reliably predict the words the user intends to input.

3.1 Adaptive Learning


The notion of adaptation is not new, and adaptation techniques have shown significant success in
improving the performance of many computer systems and applications. The adaptive learning approaches
are usually more domain-specific and more intuitive than the supervised learning. Clearly, natural languages
are very large and complex, and for any sentence, more than one, perhaps several, syntactic structures are

4
possible. However, people tend to use only a limited subset of syntactic structures of a natural language.
Similarly, lexicons used by people are extremely restricted and limited compared to the language lexicons.
Adaptive learning approaches exploit these aspects to provide solutions and answers for the particular user
in the underlying application.
The adaptive learning paradigm, in our application, learns the user’s specific writing style and word
usage to assist in word prediction. The adaptive learning records and maintains certain user-specific
information that benefits the word prediction process. This learning will be ongoing and online learning as
the user uses the system. It adjusts and adapts the prediction to the user’s style and word usage. The
information recorded includes, but not limited to, frequency and recency information on user lexicon of
words and token (unigrams), user lexicon (dictionary) of bigrams, n-grams, special phrases, usage of
punctuations and special symbols, and other word/token patterns in general.
For example: Suppose the user is entering the text:
“…..yesterday I called my d_”.
After typing the ‘d’, the method may offer two possible suggestions ‘doctor’ and ‘daughter’ based on the
learning from training text. The adaptive learning assists in this situation by giving more likelihood to one of
the two suggestions (for example, ‘doctor’) based on word usage frequency and recent usage history of this
user. Notice that this is learned after the training.
Another example: Suppose that the user is typing the text:
“ …she came and visited us last w_”.
Here, the system predicts the completion for w_ and suggests two words: week, winter. The adaptive learning
module, however, favors the suggestion ‘week’, for example, based on the word recency usage of this user.
Thus, the adaptive learning method:
• Records each word used by the user. For example, when the two words w1, w2 are computed as
suggestions in a given prediction instance, the adaptive learning module finds that, based on recency
and frequency info, w2 will be most likely the right suggestion to offer.
• Records phrases most commonly and most recently used by the user.
• Keeps track of word usage in the current text entry session.
• Keeps track of word usage recency.
• Records usage of punctuations and special symbols.

3.2 Supervised learning


In this project we will investigate two supervised learning methods, Support Vector Machines (SVM) and
Lsquare. SVMs have been proven to be a very powerful learning and classification method for high
dimensional datasets. Joachims and others had explored the uses of SVM in text categorization with a large
number of features and produced excellent results []. Lsquare also has been applied to similar problems,
such as detecting context-based spelling errors, and showed very promising results [, , , ]. Both methods use
vectors space model and have a fast classification phase.

3.2.1 Support Vector Machines


The support vector machine (SVM) [, ] is one of the most powerful learning methods and enjoys a
considerable empirical and theoretical support. SVMs have proven success in many applications like object
recognition [], face detection [], and text categorization [, ]. Given a data set consisting of two types of
records, positive and negative, SVM attempts to compute a separating hyperplane (a decision surface) with
maximum margin between the points of the two sets. This phase constitutes the training, and the computed
hyperplane will be used subsequently to classify new records in the application phase. If various lines can be
chosen as decision surfaces, the SVM method selects the middle element from the widest set of parallel lines.
The best decision surface is determined by only a small set of (training) examples called the support vectors.
This case of finding the optimal separating hyperplane with maximum margin can be generalized to the non-
separable cases via introducing the concept of “soft-margin” and constructing nonlinear classifiers via kernel
functions. The kernel functions map the training data into a higher dimensional feature space, and SVM
constructs an optimal separating hyperplane in the higher dimensional space that is corresponding to a
nonlinear classifier in the input space. To address the issues with SVM like back-off with N-gram (and with

5
other language models), and other problems related to learning including over-fitting, smoothing, data
imbalancedness, we can use the latest results by machine learning community.

3.2.2 Lsquare
Lsquare method is a logic-based data mining tool for classification (with controlled error) proposed by
Felici, Sun, and Trumper [, ]. Lsquare is a rule-based machine learning technique that is based on learning
logic. It views the training data as logic formulas, and the resulting knowledge/classifiers are logic formulas
as well. The logic formulas are represented as {0, +1, -1} vectors. Lsquare system accepts as input two sets
A and B of {0, +1, -1} vectors and all have the same length. An entry +1 means that a certain fact (feature),
say X, is known to hold, -1 means that fact X is known not to hold, and 0 means that it is unknown whether
the fact X holds. Lsquare converts this into a minsat system and solves it, to compute and outputs a set of 20
Disjunctive Normal Form (DNF) logic formulas and 20 Conjunctive Normal Form (CNF) logic formulas.
Each logic formula is capable of classifying any (new) vector to whether it belongs to class A or B. Each
logic formula is called a separating set.
Now each separating set is capable of classifying any vector to whether it belongs to A by giving it a +1 vote
or it belongs to B by giving it a –1 vote, the summation of these 40 votes is called vote-total. Then, to
classify any arbitrary vector x of length n, Lsquare computes from the separating sets a vote-total for x that
is an even integer and may range from –40 to 40. Moreover, Lsquare guarantees that for each vector x of A
(resp. B) the vote-total is positive (resp. negative). Now, when A and B are randomly drawn from two
populations A and B, then a vote-total for a vector from A UB close to 40 means that the vector is in A with
high likelihood and thus is in B with very low probability. As the vote-total decreases from +40 and
eventually reaches –40, the probability of membership in A decreases while that of membership in B
increases. For complete details on Lsquare please refer to [, ]. The Lsquare method has some new properties
summarized as follows:
(1) A new way of encoding the representation of words. The features of words are extracted such that, for
a given occurrence of a word w, the structure of the neighborhood of the occurrence is recorded in
connection with other occurrences of w and their neighborhoods.
(2) A new way of encoding word features by learning “bad” features of words, which gives an insight into
how much a target word may appear in good as well as in bad contexts.
(3) A new way of learning how correct word occurrences can be recognized from prior, correct text. This
step uses the encodings of (1) and (2) and a machine learning algorithm. It acquires the knowledge that
contains insight into the context-based syntactic representation of text words.

3.3 Applying supervised learning on word disambiguation


We have successfully applied supervised learning methods into word disambiguation task that is
extended and adapted to the word prediction problem for text completion. Let us assume that we have two
classes C1 and C2 of training examples extracted from large text corpora for training. Let C1 contains
examples of a word t1 and their contexts, whereas C2 includes examples of the word t2 with their contexts,
and (t1, t2) are the two words that we want to disambiguate for word prediction. For example, the user is
typing “…yesterday I called my d_” and we want to predict/disambiguate “doctor” from
“daughter” in this context to suggest it for the user as a word completion. The words preceding and
following the <term> (e.g., t1 or t2) are its context words. So each example in the set C1 or C2 can be
represented as follows:
pn… p3 p2 p1 <term> f1 f2 f3….fn
where the words p1 , p2 , p3 ,…. , pn and f1 , f2 , f3 ,….., fn are the preceding and following words (context)
surrounding the term, and n is called the window size. We extract all the context words W = {w1 ,w2 , … ,wm}
from the examples in the sets C1 and C2. (From now on we use wi to represent these surrounding words
instead of pi and fi).

Context words wi X2
activate 3.9
process 3.6

6
sample 3.2
deliver 2.6
inhibit 1.9
went 1.9
generate 1.8
smear 1.8
diagnose 1.6
clear 1.4
… …
Table 1: Words with the top X2 values

Now, each such context word wi (wi ∈ W) may occur in contexts of either or both of C1 and C2 with different
frequency distributions. We want to determine that to what extent an occurrence of wi in an ambiguous
example suggests that this example belongs to C1 or C2. Thus, we only select those words wi from W which
are highly associated with either C1 or C2 (the highly discriminating words) as features (word features). We
utilize feature selection techniques like mutual information (MI), chi-square (X2), Information Gain (IG) [, , ]
to select the highly discriminating context words from W. When using X2, for example, for feature selection,
we calculate X2 values for all wi ∈ W, then we choose the top k words with the highest X2 values as (word)
features in this term’s feature vectors. In our preliminary work we tested on k values of 10, 20, and 30, 40,
and 200. For example, if k = 10, then each training example is represented by a vector of 10 entries such that
the first entry represents the word with the highest X2 value, and the second entry represents the word with the
second highest X2 value, and so on. Then for a given training example, the feature vector entry is set to a value
V+ if the corresponding feature word occurs in that training example and set to a different value V- otherwise,
where V is the X2 value of that context word. Thus, if we want to utilize the 20 most discriminating words as
features to represent each term, then feature vector size will be 20. Consider the following example, let W =
{w1, w2, …, wm} be the set of all context words. We compute X2 for each wi ∈W and sort the words W
according to their X2 values in descending order as in Table 1.
These 10 words will be used to compose the feature vectors for training or testing examples of the terms to be
disambiguated. For example, the following feature vector:

V- V+ V+ V- V- V- V+ V- V- V-

represents an example containing the 2nd, 3rd and 7th feature words (i.e., process, sample, and generate) around
the term within certain window size. That is, three of the 10 feature words are occurring in the context (e.g.
window of size 4) of the term. Notice that, we do not encode positional information of the feature words in the
feature vector. For example the word ‘generate’, occurred as third preceding word in the term context but it
translates to a “V+” in the seventh entry of the feature vector.
Selecting only features with high discriminating capabilities results in better disambiguation quality than
existing methods as shown in our previous work [, , , ]. Furthermore, we have found that choosing appropriate
values for V- and V+ can improve the disambiguation accuracy. If we simply set V- as 0 and V+ as 1 (the vector
will look like [0, 1, 1, 0, 0, 0, 1, 0, 0, 0]), we can achieve the accuracy rate of about 80%, which is already
better than existing methods. When we set V- and V+ as – MI and MI respectively (the vector will look like
[-3.9, 3.6, 3.2, -2.6, -1.9, -1.9, 1.8, -1.8, -1.6, -1.6]), we achieved the accuracy and recall rates approaching
99%. Clearly when the machine learners (e.g. SVM, C4.5) encounter such training examples, it will fine tune
and calibrate its learning mechanism to these highly distinguishing features values.

3.4 Extending the disambiguation method to word prediction


Here are the steps how we extend our word disambiguation method to the word prediction in the context of
improving text entry:
• Using only proceeding words as (context) features. In word disambiguation, we can utilize both
preceding and following words in feature vectors. But in word prediction we only know the proceeding

7
words to predict the next word. We will investigate how effective these preceding words are for word
prediction.
• How to formulate 2-way disambiguation to multi-way prediction. Unlike gene/protein disambiguation,
where there are only two classes, multiple reasonable words may exist given a set of preceding words.
We will investigate how to adapt our disambiguation method to work with multiple potential
completions.

Figure 1: Word prediction phase

The word prediction problem can be formally stated as follows. In a given context of k preceding words
C={p1, p2, ……… pk} and a given set S of n suggestions S = {s1, s2,…… ,sn}, we would like to compute, for the
context C, the most likely si ∈ S . We divide the word prediction into three steps:
(1) Collecting feature vectors for common English words. The supervised learning algorithms learn to
disambiguate the words si ∈ S in a given context. This can be done by learning the original contexts
of these words in the training text. To do that, we extract the most important and most relevant
features of such words from the training texts to be able to efficiently find the best suggestion in a
given context during the application phase.
(2) Adjusting these feature vectors by learning from a user’s existing documents. The adaptive learning
methods will be investigated to learn a user’s preferences and writing style. The feature vectors
obtained from general corpora through supervised learning will be adapted accordingly to better
predict words for each individual user.
(3) Ranking and displaying the predicted words. In this application step, given a context a feature vector
for each of the suggestions will be computed, then the suggested words will be ranked and displayed
for user to choose.

3.4.1 Ranking and displaying the predicted words


In the prediction module I (shown in Figure 1), a number of suggestions can be computed based on word
frequency and trained word bigram and n-gram models. For example, if the user types in the letter ‘c’, based
on the results of supervised learning this module may output the following suggestions for completion: class,
could, can, cleared, current, clean, came, cloudy,…etc. Then if the text entered is ‘cl’, the module produces
class, cleared, clean, cloudy. Next, the second module, Module II, will select among these suggestions (class,
cleared, clean, cloudy) the most likely few based on a user’s writing preference and word usage history, and
offering them to the user as suggestions for completion. The second module takes a number of words, a
context and a user’s word usage history as input and computes the most likely candidates. The context, in this
case, is the preceding words and tokens. Thus, module II plays an important role in reducing the number of
suggestions. We will investigate the impact of the number of suggestions on whether the desired word is
included, which affects how much cognitive load a user takes.

4. The Word Prediction System Architecture


The word prediction system consists of a supervised learning module to generate the general knowledge
base, an adaptive learning module generating the personal knowledge base, and a prediction and interface
module integrated with text editors (shown in Figure 2).

8
4.1 Supervised learning module
The supervised learning module consists of
machine learning algorithms that acquire
supervisory information from training data. In the
training phase, the information (for example, word
and token features) extracted from the training
texts are very crucial and important in affecting the
performance of the application. As discussed
earlier, feature vectors will be extracted from
training text to produce a large-scale knowledge
base. Building such a knowledge base is continuous
work and time-consuming. As shown in our
previous experiment, collecting feature vectors for
200,000 gene/protein instances took about 150
hours on a Pentium 4 computer (3.0 GHz CPU, 1
GB memory). So acquiring feature vectors for
common English words is feasible for this project
although training with more corpora will improve
Figure 2 Word Prediction System Architecture
the quality of feature vectors. As more or better
feature vectors are collected, we will release new
versions of knowledge bases regularly. The
knowledge base will be designed in a “plug and play” fashion for easy updating and adoption by regular users.
Besides building larger knowledge base, another direction could be the learning of specialized domains, such
as from medical or financial documents.

4.2 Adaptive learning module


The adaptive learning module will be integrated into the prediction system in a way that it will contribute
into the decision making process of word prediction. By and large, adaptive learning collects specific and
customized knowledge from a user or specialized domain into a local personal knowledge base (shown in
Figure 2). Adaptive learning adds another dimension that allows for exploring and more exploiting of the
available inputs and resources in a given problem. Yet another dimension can be added in the adaptation
process: in our system, for example, we may distinguish between the word frequency and recency information
extracted from the current text versus those from prior texts of (this) same user, and assign different weights to
these different frequencies. Thus we are able to customize and adapt to the peculiarity of the user. Adaptive
learning module is a key component of this project particularly for the goal of minimizing cognitive load by
generating only a few suggestions for completion in most cases.

4.3 Interface module


We will design our text prediction and completion software to be used with various editing applications,
such as text editors, word processors, Web browsers, and email clients, and a sample interface is shown in
Figure 3. A user can type the number to select the desired word in the list. We will implement our software to
run as a background service component, which can be easily started or stopped by users, so both public
computers and personal computers can adopt our system. Another requirement is that the suggestions have to
be generated in real time, and based on our experience SVM’s classification phase is fast enough to produce
the up to 3 predicted words in real time in our project.

9
Figure 3: Suggested words are shown as a user types in an editor

5. Experimental Results and Evaluation


We have applied our method to disambiguate gene and protein names in medical literature []. Many genes
and proteins share the same names, and their disambiguation, sometimes, is difficult even for biomedical
experts. So authors often have to explicitly put “protein” or “gene” before or after the protein or gene names.
These explicitly disambiguated protein or gene instances are used for automatic evaluation of our
disambiguation technique.
In Table 2 we show our experiments using 20,000 Medline abstracts from year 2002. Medline database
contains huge collection of biomedical documents (about ~16 millions research abstracts dated back to
1950).Within these abstracts we found 6,106 occurrences of gene/protein names. We set V- = 0 and V+ = 1 in
the feature vectors.

Training set
Testing set size Accurac Recal incorrect
Method size Precision
(20%) y l instances
(80%)
20 fields 4885 1221 85.0% 86.3% 96.7% 183
X2 30 fields 4885 1221 85.5% 87.0% 96.2% 177
40 fields 4885 1221 86.4% 87.9% 96.3% 166
20 fields 4885 1221 80.4% 81.1% 98.4% 239
MI 30 fields 4885 1221 80.2% 81.2% 98.0% 242
40 fields 4885 1221 80.2% 81.2% 97.9% 242
Table 2: Result of 5-fold cross validation experiments using SVM and vectors of
20, 30 or 40 fields

10
Training Testing set Incorrect
Year Classifier Method Accuracy Precision Recall
set size size instances
2-fold 25593 25593 98.55% 96.84% 100% 372
X2
5-fold 40950 10236 99.37% 98.61% 100% 64
SVM
2-fold 25593 25593 99.76% 100% 99.45% 63
MI
5-fold 40950 10236 99.87% 100% 99.69% 13
2-fold 27306 27306 99.0% 98.9% 99.0% 315
2004 X2
Naïve 5-fold 43690 10922 99.0% 98.9% 99.0% 318
Bayes 2-fold 27306 27306 99.0% 98.9% 99.0% 315
MI
5-fold 43690 10922 99.4% 99.4% 99.4% 178
2-fold 29337 29337 100% 100% 100% 0
X2
Decision 5-fold 46939 11735 100% 100% 100% 0
Table 2-fold 29337 29337 100% 100% 100% 0
MI
5-fold 46939 11735 100% 100% 100% 0
Table 3: Word disambiguation experiment results

We performed 5-fold cross validation with SVM, and get good accuracy, precision and recall rates. With
different sizes of vectors, our technique shows a very good and stable performance supporting the effectiveness
of the method. We notice in this experiment that X2 gives better results than MI and the best results in this
experiment are obtained with X2 and vectors of size 40.
In another evaluation (Table 3), we collected all of the 600,000 medical document abstracts available in
Medline published from 2001 to 2004. From these abstracts we extracted about 200,000 unambiguous gene
and protein name instances. These unambiguous instances were explicitly disambiguated by authors with
“gene” or “protein” immediately before or after each instance. In the learning phase the explicit “gene” or
“protein” is used as class labels, but in the testing phase “gene” or “protein” is deleted from the documents.
With each instance its context of 10 preceding and 10 following words (i.e., window size is 10) were extracted
to enable testing on windows of any size up to 10. V- is set as -MI or -X2, and V+ as MI or X2 in the feature
vectors. Both 2-fold and 5-fold cross-validations were done using three classification methods: SVM, Decision
Table and Naïve Bayes. The experiment results for these four years data (2001-2004) are similar, and we only
show the results of year 2004 in Table 3 due to space limitation. The accuracy, precision and recall rates
approach or exceed 99%, and indicate that our method is highly effective.

6. Conclusion
We presented a word prediction method as an assistive technique for people with mechanical disability. The
proposed method will allow for fast text entry and more accurate text communication with computers. This
aspect is highly appreciated and appealing for people with reduced mechanical and motion abilities. One of
the main issues that was tackled by this research is reducing cognitive load. The proposed techniques will
incur minimal cognitive loads on users due to two reasons. First, unlike many other assistive technologies,
we do not introduce any new gadgets or devices that require training and learning. Second, the word
prediction and completion method generates very few suggestions for text completion, and thus, the user
need not go through a long list of suggested words which may lead to even losing the intended word in
mind.

References
1. H. Al-Mubaid, Context-Based Word Prediction and Classification. Proc. of the ISCA International
Conference on Computers and Their Applications CATA'03, 2003. Honolulu, Hawaii USA. pp 385-388.

11
2. H. Al-Mubaid and S. Umair, “A New Text Categorization Technique Using Distributional Clustering and
Learning Logic”. IEEE Transactions on Knowledge and Data Engineering (TKDE), Vol. 18, no.9, pp.
1156-1165, September 2006
3. H. Al-Mubaid and K. Truemper, Learning to Find Context-Based Spelling Errors. In Data Mining and
Knowledge Discovery Approaches Based on Rule Induction Techniques, E. Triantaphyllou and G. Felici
(Editors). Massive Computing Series, Springer, Heidelberg, Germany, pp. 597-628, 2006.
4. H. Al-Mubaid, P. Chen, “Biomedical Term Disambiguation: An Application to Gene-Protein Name
Disambiguation”, Third International Conference on Information Technology: New Generations, April
2006, Las Vegas, Nevada
5. M. Baroni, Matiasek, J. and Trost, H. Wordform- and class-based prediction of the components of
German nominal compounds in an aac system. Proceedings of COLING 2002. pp1-7.
6. J.R. Bellegarda, “Large Vocabulary Speech Recognition with Multispan Statistical Language Models”.
IEEE Transactions on Speech and Audio Processing 8(1):76-84, 2000.
7. V. Blanz, B. Scholkopf, H. Bulthoff, C. Burges, V. Vapnik, and T. Vetter. Comparison of view-based
object recognition algorithms using realistic 3d models. In Artificial Neural Networks ICANN-96, Berlin,
Germany, 1996.
8. B. Boser, I. Guyon, and V. Vapnik. A training algorithm for optimal margin classifiers. Fifth annual
workshop on Computational Learning Theory. 1992. pp144-152.
9. A. Carlberger, Carlberger J., Magnuson T., Unnicutt S., Palazuelos-Cagigas S., and Vavarro S., Profet, a
new generation of word prediction: An evaluation study. In Copestake A., Lnager S., and Palazuelos-
Cagigas S. (editors), Natural language processing communication aids, Proceedings of a workshop
sponsored by ACL, Spain, 1997.
10. P. Chandler, Sweller, J. Cognitive load theory and the format of instruction. Cognition and Instruction, 8,
pp. 293-332. 1991.
11. P.   Chen,   Al­Mubaid   H.,   Context­based   Term   Disambiguation   in   Biomedical   Literature,   The   19th 
International FLAIRS Conference, May 2006, Melbourne, Florida
12. A. Cockburn and Amal Siresena. Evaluating mobile text entry with the Fasta Keypad. People and
Computers XVII (Volume 2): British Computer Society Conference on Human Computer Interaction.
Bath, England. 2003, pages 77-80.
13. A. Copestake, Augmented and alternative NLP techniques for augmentative and alternative
communication. Proc. of ACL workshop on natural language processing for communication aids. Madrid,
1997. pp. 68-78.
14. J.J. Darragh, I.H.Witten, and M.L. James. The Reactive Keyboard: A Predictive Typing Aid. IEEE
Computer (23) 11 pp.41-49, 1990.
15. P.W. Demasco and K.F. McCoy. Generating Text from compressed input: An intelligent interface for
people with severe motor impairment. Communications of the ACM. Vol. 35, No.5, May 1992. pp. 68-78.
16. Y. Even-Zohar and D. Roth, A classification approach to word prediction, NAACL’00, May 2000.
17. A. Fazly, The use of syntax in word completion utilities. Master thesis, University of Toronto, Canada,
2002
18. A. Fazly and Hirst, Graeme. Testing the efficacy of part-of-speech information in word completion.
Workshop on Language Modeling for Text Entry Methods, 11th Conference of the European Chapter of
the Association for Computational Linguistics, Budapest, April 2003. pp. 9-16.
19. G. Felici, F. Sun, and K. Truemper, A method for controlling errors in two-class classification, Proc. of
23rd annual int’l computer software and applications conference (COMPSAC 99), Phoenix, AZ, 1999. pp
186-191.
20. G. Felici, F. Sun, and K. Truemper. Learning logic formulas and related error distributions. In Data
Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques, E. Triantaphyllou
and G. Felici (Editors). Massive Computing Series, Springer, Heidelberg, Germany, pp. 597-628, 2006.
21. G. Foster, Pierre Isabelle, and Pierre Plamondon. Word Completion: A First Step Toward Target-Text
Mediated IMT. Proceedings of the 16th international conference on computational linguistics, 1996.
Denmark. pp 394-399.

12
22. N. Garay-Vitorai and J. Gonzalez-Abascal. Intelligent word prediction to enhance text input rate.
Proceedings of 1997 International conference on intelligent user interface IUI’97, Orlando, FL, 1997. pp
241-244.
23. E. P. Glinert and York B. W. Computers and people with disabilities. Communication of ACM 35, 5,
pp.32–35. May 1992.
24. T. Joachims. Text categorization with support vector machines, European conference on machine learning,
ECML, 1998.
25. S. Keates, J. Clarkson, P. Robinson. Investigating the applicability of user models for motion-impaired
users. The Fourth International ACM SIGCAPH Conference on Assistive Technologies ACM ASSETS-
00, Arlington, Virginia. 2000.
26. C. Laine, and Bristow, T. Using manual word-prediction technology to cue students’ writing: Does it really
help?. Proceedings of the Fourteenth International Conference on Technology and Persons with
Disabilities, Los Angeles, CA, USA 1999. pp 243-247.
27. G. W. Lesher, Moulton B. J., and Higginbotham D. J. Techniques for augmenting scanning
communication. Augmentative and Alternative Communication 14, pp.81–101, 1998.
28. G. W. Lesher, B.J. Moulton, D.J. Higginbotham, and B. Alsofrom, Limits of human word prediction
performance, CSUN 2002, California State University, Northridge.
29. Lytras M. and Pouloudi A. (2006). Towards the development of a novel taxonomy of knowledge
management systems from a learning perspective: an integrated approach to learning and knowledge
infrastructures. Journal of Knowledge Management; Vol.10, No. 6. pp. 64-80
30. Lytras M., Pouloudi A. and Poulymenakou A. (2002). Knowledge management convergence-expanding
learning frontiers. Journal of Knowledge Management; Vol.6, No.1. pp 40-51
31. Naeve, M.; Lytras, M.; Nejdl,W.; Harding, J.; and Balacheff, N. (2006). Advances of Semantic Web for
E-Learning: Expanding Learning Frontiers. British J. Educational Technology, Vol.37, No.3, 2006.
32. S. MacKenzie, Kober H., Smith D., Jones T., and Skepner E. Letterwise: prefix-based disambiguation for
mobile text input. Proc. of ACM symposium on user interface software and technology UIST, 2001. pp
111-120.
33. B. Manaris, Valanne MacGyvers, Michail Lagoudakis. Universal Access to Mobile Computing Proc. of
12th International Florida AI Research Symposium (FLAIRS-99), Orlando, FL,May 1999, pp. 286-292.
34. B. Manaris, Renée McCauley , Valanne MacGyvers. An Intelligent Interface for Keyboard and Mouse
Control. Proc. of 14th Int’l Florida AI Research Symposium (FLAIRS-01), FL, 2001, pp. 182-188
35. K.F. McCoy. Simple NLP techniques for expanding telegraphic sentences. Fifth Workshop on Very Large
Corpora, UPENN ACL, USA,1997. pp. 48-54.
36. OpenOffice project: http://www.openoffice.org/
37. E. Osuna, R. Freund, and F. Girosi. Training support vector machines: An application to face detection.
IN proc. of IEEE CVPR-97, Peurto Rico, 1997. pp130-137.
38. L. Quiroga, Martha E. Crosby and Marie K. Iding. Reducing Cognitive Load. Proceedings of the 37th
Hawaii International Conference on System Sciences, 2004. pp 319-334.
39. B. Sheryl. The Role of Technology In Preparing Youth With Disabilities For Postsecondary Education And
Employment. Journal of special education 18(4) fall 2003. pp. 46-53.
40. S.M. Shieber and E. Baker. Abbreviated text input. Proceedings of the 2003 international conference on
Intelligent User Interfaces IUI-03 ACM, Miami, USA 2003. pp. 293-296
41. C. Stephanidis, Paramythis A., Sfyrakis M., Stergiou A., Maou N., Leventis A., Parapoulis G., and
Karagiannidis C. Adaptable and adaptive user interfaces for disabled users in the AVANTI project.
Proceedings of the 5th International Conference on Intelligence in Services and Networks (IS&N 98),
pp.153–166. 1998. Oslo, Norway.
42. C. Steriadis and Constantinou P. Designing Human-Computer Interfaces for Quadriplegic People. ACM
Transactions on Computer-Human Interaction, Vol. 10, No. 2, June 2003, pp. 87–118.
43. T. Stocky, A. Faaborg, H. Lieberman. “A Commonsense Approach to Predictive Text Entry.” Proc. of
Conference on Human Factors in Computing Systems. April 24-29, Vienna, Austria, 2004.
44. V. Vapnik. Statistical Learning Theory. John Wiley & Sons, New York, 1998. pp. 1163-1166.

13
45. J.O. Wobbrock, B.A. Myers, and J.A. Kembel. EdgeWrite: A stylus-based text entry method designed for
high accuracy and stability of motion. Proc. of ACM Symposium on User Interface Software and
Technology (UIST '03). Vancouver, B.C., November 2003. pp. 61-70

14

Das könnte Ihnen auch gefallen