Sie sind auf Seite 1von 10

Atlas: A dialogue-driven chatbot based on a NLIDB

Computer Science Senior Thesis

Jon Abdulloev
Earlham College
Richmond, Indiana
aabdul15@earlham.edu
ABSTRACT
Many resources and opportunities in a community go unnoticed
because they are not efficiently recorded and tracked. A relational
database management system makes for an effective system to
keep track of available resources. However, not every member of
a community has the necessary SQL background to communicate
with a relational database. In this paper, I present a chatbot called
Atlas, which is based on a Natural Language Interface to Data-
base (NLIDB) system. The purpose of Atlas is to democratize SQL
while interacting with users in a user-friendly manner. I conduct
experiments to test how accurately Atlas classifies natural language
(English) sentences before converting them into SQL commands.
The results of testing Atlas at a small liberal arts college reveal
Figure 1: Visualization of the scope of NLP
that if all community members are easily able to access and con-
tribute to a SQL database containing information about available
resources, then a more equitable distribution and a sustainable use
(4) Acting Rationally
of a community’s resources will be achieved.
The authors discuss what it means to be intelligent and hence
KEYWORDS introduce the Turing Test which was proposed by Alan Turing
Natural Language Processing, Natural Language Understanding, (1950). A computer passes the Turing Test, which was designed
Natural Language Interface to Database System, ChatBot to provide a satisfactory operational definition of intelligence, if
a human interrogator cannot tell whether the written responses
come from a person or from a computer. Russel et al. explain that
1 INTRODUCTION programming a computer to pass a rigorously applied test involves
In the past couple of decades, there has been a significant amount possessing the following capabilities:
of research on Natural Language Processing (NLP) which has been • natural language processing to enable it to communicate
largely motivated by its enormous applications. Some of the well- successfully in English;
known systems that use NLP techniques include Siri from Apple • knowledge representation to store what it knows or hears;
[7], IBM Watson [6] and Wolfram|Alpha [11]. NLP is a broad topic • automated reasoning to use the stored information to answer
with various subtopics as shown by the Venn diagram in Figure 1 questions and to draw new conclusions;
which is a visualization of the categorization of the subtopics of • machine learning to adapt to new circumstances and to de-
NLP.1 tect and extrapolate patterns.
A Natural Language Interfaces to Database (NLIDB) is a type
The modern study of linguistics and AI developed in the same
of database interface that allows the user to access the data using
century, intersecting in a hybrid field called computational linguis-
natural language [20]. There has been much research conducted in
tics or natural language processing. In order to make accurate sense
recent years on building efficient NLIDBs that allow people without
of natural language we need to concretely understand the subject
SQL backgrounds to query a SQL database in natural language. The
matter and context, as well as the structure of sentences. Much of
idea for Atlas was inspired by the work of a research team at Sales-
the early work in knowledge representation (the study of how to
force, that developed Seq2SQL [24] with the aim of democratizing
put knowledge into a form that a computer can reason with) was
SQL so databases can be queried in natural language.
tied to language and informed by research in linguistics, which was
In their book called Artificial intelligence: a modern approach [21],
connected in turn to decades of work on the philosophical analysis
Russell et al. explain the historical four main approaches to AI:
of language.
(1) Thinking Humanly In this paper, I discuss a dialogue-driven chatbot based on a
(2) Thinking Rationally NLIDB called Atlas, which acts and thinks humanly. When con-
(3) Acting Humanly nected to a user interface, Atlas allows users to add to and query
1 https://www.quora.com/What-is-the-difference-between-natural-language- a SQL database containing information about the events and re-
processing-NLP-and-natural-language-understanding-NLU sources (including tangible objects like cars, chargers, etc. as well
as services like rides, tutoring, etc.) that are available within a com- phrase (NP) anaphora (e.g. "the river" instead of "the word’s longest
munity. Atlas is intended to interact with users in a user-friendly river").
manner through its dialogues and responses. Section 2 presents While the proposed system by Quarteroni et al. uses a pattern-
several related works that inspired and influenced the work on matching approach to detect followup question, Atlas’ dialogue
Atlas. Section 3 explains how Atlas functions as a conversational manager checks whether a given sentence is related to the preceding
agent dialog system. Section 4 discusses Atlas’ system architecture sentences through the use of the dialogue_id and dialogue_depth
while explaining the problem domain of natural language. Section 5 parameters, which are sent to the user interface client as part of
analyzes and discusses the results of testing Atlas in a small liberal a JSON object. Furthermore, the discussion about the usage of a
arts college called Earlham College in Richmond, Indiana. Finally, data-driven answer clarification in order to make the dialogue com-
Section 6 concludes and presents some future work for Atlas. ponent more tied into the structure of the QA system is relevant to
Atlas as future work.
2 RELATED WORK
2.2 Information Extraction
2.1 Dialogue Manager
Information extraction is defined as the extraction of information
Serban et al. present a deep reinforcement learning chatbot called
from a text in the form of text strings and processed text strings
MILABOT, that is capable of conversing with humans on popular
which are placed into slots labeled to indicate the kind of informa-
small talk topics through both speech and text [22]. Their proposed
tion that can fill them [3]. A wide range of approaches exist for pars-
system consists of an ensemble of natural language generation and
ing natural language sentences including simple parts-of-speech
retrieval models. There are 22 response models in their system,
tagging and context-free grammars, as well as more advanced tech-
including retrieval-based neural networks, generation-based neural
niques such as Lexical Functional Grammars, Head-Driven Phrase
networks, knowledge-based question-answering, template-based,
Structure Grammars, Link Grammars, and stochastic approaches. In
bag-of-words and latent variable neural network models. These
this section we will discuss two simple methods involving generic
models take as input a dialogue and output a response in natural
patterns and one advanced approach which uses Link Grammars.
language text.
Popescu et al. discuss an approach to modern NLIDBs which
MILABOT’s dialogue manager is responsible for combining the
involves composing statistical parsing with semantic tractability
response models together. As input, the dialogue manager expects
[17]. They use the term semantic tractability to describe "easy-to-
to be given a dialogue history (i.e. all utterances recorded in the
understand" questions where the words or phrases correspond to
dialogue so far, including the current user utterance) and confidence
database elements or constraints on join paths. They introduced
values of the automatic speech recognition system. To generate
the PRECISE algorithm which takes a lexicon and a parser as input.
a response, the dialogue manager follows a three-step procedure.
The authors define a lexicon as a 3-tuple (T, E, M), where T is
First, it uses all response models to generate a set of candidate
the set of tokens, E is the set of database elements, wh-values2
responses. Second, if there exists a priority response in the set of
and join paths3 , and M is a subset of T × E - a binary relation
candidate responses (i.e. a response which takes precedence over
between tokens and database elements. Given an English question,
other responses), this response will be returned by the system.
PRECISE maps it to one or more corresponding SQL queries using
Third, if there are no priority responses, the response is selected by
the attachment function F L,q : T → T , where L is the lexicon, q
the model selection policy.
is a question, and T is the set of tokens in the lexicon. Tokens can
While no machine learning techniques are currently in use in
be characterized as linguistic units such as words, punctuation,
Atlas’ system, there is a strong motivation to use neural networks to
numbers or alphanumerics [15].
improve the Atlas’s dialogue manager based on the work of Serban
Popescu et al. discuss their observations about sentence struc-
et al.
tures and reveal that nouns, adjectives, and adverbs in semantically
Quarteroni et al. report on the design, implementation and eval-
tractable questions refer to database relations, attributes or values.
uation of a chatbot-based dialogue interface for an open-domain
The attributes and values in a question tend to "pair up" naturally
QA system, showing that chatbots can be effective in supporting
to indicate equality constraints in SQL. These findings about sen-
interactive QA [18]. The system is able to provide both factoid and
tence structures are very relevant to how Atlas maps nouns and
complex answers such as definitions and descriptions. The dialogue
adjectives/adverbs to the resource and descriptors attributes of
interface’s role is to enable an information seeking, cooperative,
the resources relation, respectively. However, much work needs
inquiry-oriented conversation to support the question answering
to be done for Atlas to be able to capture implicit attributes in
component. The authors explain that their chatbot dialogue follows
semantically tractable questions.
a pattern-matching approach and is therefore not constrained by
Etzioni et al. introduce KNOWITALL, a system that aims to
a notion of "state". When a user utterance is issued, the chatbot’s
automate the tedious process of extracting large collections of facts
strategy is to look for a pattern matching and fire the corresponding
from the web in an autonomous, domain-independent, and scalable
template response.
manner [5]. KNOWITALL associates a probability with each fact
The authors design a dialogue manager that invokes external
enabling it to trade off precision and recall. Each KNOWITALL
modules such as the followup recognition and QA component.
module runs as a thread and communication between modules is
The types of followup questions which the system is able to han-
dle are elliptic questions, questions containing third-person pro- 2 wh-values correspond to wh-words such as what and where.
noun/possessive adjective anaphora, or questions containing noun 3A join path is a set of equality constraints between a sequence of database relations.
2
accomplished by asynchronous message passing. Its main modules linguistic knowledge. Attachment ambiguity arises from uncer-
are: tainty of attaching a phrase or clause to a part of a sentence and
thus it occurs if a constituent fits more than one position in a parse
• Extractor: Whenever a new class or relation is added to
tree. An example of attachment ambiguity is "The man saw the
KNOWITALL’s ontology, the Extractor uses generic, domain-
girl with the telescope".
independent rule templates to instantiate a set of information
The PP attachment problem is a standard type of attachment
extraction rules for that class or relation. It also uses the Brill
ambiguity which involves prepositional phrases (PP). The authors
tagger [2] to assign part-of-speech tags and identifies noun
present a detailed solution to the common PP-attachment prob-
phrases with regular expressions based on the part-of-speech
lem by applying selectional constraints on the selection of one of
tags.
the possible attachments. The selectional constraints which are
• Search Engine Interface: This involves automatically formu-
derived from semantic and world knowledge are used to compose
lating queries based on the extraction rules, whereby each
the meanings of the two child units being attached with meanings
rule has an associated search query composed of the key-
of each of the possible parent syntactic units. They explain that
words in the rule.
such selectional constraints are typically represented in the form
• Assessor: The Assessor uses a form of point-wise mutual
of a permissible range of fillers for slots in frames representing
information (PMI) between words and phrases that is esti-
the meanings. The potential filler, which refers to the meaning of
mated from web search engine hit counts in a manner similar
the child unit, is compared to the selectional constraint by a fuzzy
to Turney’s PMI-IR algorithm [23]. For example, if the Ex-
match function which computes a weighted distance between the
tractor has proposed "Liege" as the name of a city and the
two meanings in a semantic or ontological network of concepts.
PMI between "Liege" and a phrase like "city of Liege"
The closer the two meanings are in the network, the higher the
is high, then this is evidence that "Liege" is a valid instance
score the function assigns to the choice. The algorithm then com-
of the class City. The Assessor computes the PMI between
bines the scores from different constraints for the same choice by
each extracted instance and multiple phrases associated with
applying a mathematical function. The resulting combined scores
cities. These mutual information statistics are combined via
are used to select the best choice according to the knowledge of
a Naive Bayes Classifier. Hence in order to improve the pre-
selectional constraints.
cision of the information that is extracted from the web,
Word Sense Ambiguity (Lexical semantic ambiguity) is the prob-
statistics computed by querying search engines are used to
lem of determining which sense (meaning) of a word is activated
assess the probability of the correctness of the Extractor’s
by the use of the word in a particular context, a process which
conjectures. The Assessor also measures co-occurrence sta-
appears to be largely unconscious in people.4 To resolve this type
tistics of candidate extractions with a set of discriminator
of ambiguity, the lexical meaning that best fulfills the constraints on
phrases.
the composition of the text’s possible meanings is selected. World
• Database: Etzioni et al. store information (including meta-
knowledge is used to derive selectional constraints on how mean-
data such as the rationale for and the confidence in individual
ings can be composed with one another [16]. In lieu of world knowl-
assertions) in a commercial RDBMS. They argue that the ad-
edge, statistical methods can also be used to derive the constraints
vantage of this is that the database is persistent and scalable,
in narrow domains where sufficient training data on word senses is
supporting rapid-fire updates and queries.
available. The process of checking selectional constraints is imple-
Atlas’ noun phrase extraction is inspired by KNOWITALL’s mented as a search in a semantic network or conceptual ontology
Assessor module, despite the lack of use of any Machine Learning and can be expensive if large networks are involved. Therefore,
techniques (such as a Naive Bayes Classifier in KNOWITALL) in constraints are only checked for those pairs of words in the text
Atlas’ extractor. Atlas also uses pattern matching as the basis for that are syntactically related to one another.
extraction rule templates and adds syntactic constraints that look Integrating a knowledge base into Atlas’s system will signifi-
for simple noun phrases. cantly improve its performance in capturing complex noun phrases,
since the substantial amount of knowledge about the world and
the domain of discourse can be used to make sense of given texts.
2.3 Knowledge-Based NLP For instance, given the sentence "Find restaurants with bars",
Mahesh et al. explain that a knowledge base is an integral part of a Atlas’s ability to capture the essence of the query (ride) will be
NLP system due to its use in the extraction and acquisition of mean- improved with a knowledge about the word bar. This is because
ings from corpora. They show how general knowledge of the world knowledge of selectional constraints can be used to resolve the
together with domain-specific knowledge can be used to resolve word sense ambiguity and determine that bar in this case is a
syntactic and semantic ambiguities and make necessary inferences, liquor-serving place in order for it to be with a restaurant, instead
the process of which is very relevant to the future implementations of an oblong piece of a rigid material.
of Atlas [16].
Syntactic Ambiguity can be of two kinds: Scope ambiguity and
Attachment ambiguity. Scope ambiguity is where the scope of ref-
erence is unclear and open to interpretation. An example of scope
ambiguity is "Look at the nice men and women". Scope Am-
4 Edmonds, Philip, and Eneko Agirre. "Word Sense Disambiguation." Scholarpedia.
biguity is the simple case which can be resolved using syntactic
3
2.4 Using Clarification Dialogues to Resolve the digital assistants that are now on every cellphone or on
Ambiguities home controllers (Siri, Cortana, Alexa, Google Now/Home,
etc.) whose dialog agents can give travel directions, control
Another approach to NLIDBs simply asks the user for clarification
home appliances, find restaurants, or help make phone calls
instead of using complex semantics-based techniques. Kaufmann
or send texts. Companies deploy goal-based conversational
et al. present Querix, a domain-independent natural language in-
agents on their websites to help customers answer ques-
terface for the Semantic Web that resolves ambiguities in natural
tions or address problems. Conversational agents play an
language to query ontologies by using clarification dialogues [13].
important role as an interface to robots.
The system consists of seven main parts: a user interface, an ontol-
ogy manager, a query analyzer, a matching center, a query generator, As with any other dialogue system, a dialogue with Atlas consists
a dialog component, and an ontology access layer. of multiple turns, each a single contribution to the dialogue. A turn
The user interface allows the user to enter full NLQs and choose can consist of a sentence, although it might be as short as a single
the ontology to be queried. When an ontology is chosen and loaded word or as long as multiple sentences.
into Querix, the ontology manager enhances the resources’ labels There are three major chatbot architectures: (i) rule-based sys-
by obtaining synonyms from WordNet. The query analyzer uses tems (ii) information retrieval systems which simply copy a hu-
the Stanford Parser [Klein and Manning, 2002] to generate a syn- man’s response from a previous conversation and (iii) transduction
tax tree for the NL query from which the sequence of the main models which use a machine translation paradigm such as neural
word categories Noun (N), Verb (V), Preposition (P), Wh-Word (Q), network sequence-to-sequence systems, to learn to map from a
and Conjunction (C) are extracted and used to build a query skele- user utterance to a system response. Rule-based systems include
ton. For example, the query skeleton for the query "What are the early influential ELIZA and PARRY systems. Information re-
the population sizes of cities that are located in trieval systems and transduction models fall under corpus-based
California?" is Q-V-N-P-N-Q-V-P-N. The query analyzer also systems, which mine large datasets of human-human conversations.
uses WordNet which provides synonyms for all the nouns and Atlas falls under the corpus-based chatbot system category as it
verbs in the query’s parse tree. The authors implement a cost func- uses information retrieval techniques for question-answering.
tion in order to obtain the most appropriate synonyms as WordNet There are also four kinds of task-oriented dialogue management
usually suggests too many. Querix then matches the query skeleton architectures that are most common:
with the synonym-enhanced triples in the ontology. (1) Finite-state and frame-based are the simplest and most com-
In order to resolve ambiguities whereby there are several dif- mercially developed dialogue manager architectures.
ferent solutions to a query with the same cost score, the system’s (2) Information-state dialogue managers
dialog component consults the user by showing a menu from which (3) Probabilistic version of information-state managers based
the user can choose the meaning she/he intended. The meanings on Markov Decision Processes
offered are based on the possible triples identified by the matching (4) Plan-based architectures
center.
The dialogue manager behind Atlas is based on a finite-state
The first version of Atlas was inspired by Querix’s dialogue com-
architecture. Altas is both a system initiative and a user (single)
ponent to resolve ambiguities in capturing correct noun resources
initiative system - it is system initiative since it mostly controls
from complex noun phrases. Hence, whenever a user tried to add
the conversation with the user when it asks for information but
to the database, Atlas would confirm the resource that was being
it is user initiative because it hands over control to the user when
offered before it proceeded with modifying the database. However,
answering user questions. Pure user initiative systems are generally
for the purposes of user experience, this confirmation dialogue was
used for stateless database querying systems, where the user asks
removed in the version of Atlas presented in this paper. To com-
single questions of the system, which the system converts into SQL
pensate for the clarification dialogue, a more sophisticated method
database queries, and returns the results from some database. The
of capturing the resource noun was employed, which is discussed
limited user initiative finite-state dialogue manager architecture has
in more detail under section 4.
the advantage that the system always knows what question the user
is answering. Knowing what the user is going to be talking about
3 ATLAS AS A DIALOGUE-DRIVEN CHATBOT also makes the task of the natural language understanding engine
Atlas is a conversational agent dialog system that is in the intersec- easier. Most finite-state systems also allow universal commands.
tion of the following two classes of dialog systems as described by Universals are commands that can be said anywhere in the dialogue;
Jurafsky et al. in their book called Speech and Language Processing every dialogue state recognizes the universal commands in addition
[12]: to the answer to the question that the system just asked. In its
• Chatbots, which are systems designed for extended, casual current phase, Atlas does not have any universal commands but in
conversations, are set up to mimic the unstructured conver- later versions, I plan on incorporating common universals such as
sational or ’chat’ characteristic of human-human interaction. help (which will give the user a help message), start over (which
These systems often have an entertainment value, but are returns the user to some specified main start state) and some sort
also often attempts to pass various forms of the Turing test. of command to correct the system’s understanding of the users last
• Task-oriented dialog agents, which are designed for a partic- statement (San-Segundo et al., 2001).
ular task and set up to have short conversations to get infor- As a system-initiative finite-state dialogue manager, Atlas can
mation from the user to help complete the task. These include use universals for very simple tasks such as entering a credit card
4
number or a name. However, pure system-initiative finite-state the user and task, (ii) building simulations and prototypes and (iii)
dialogue manager architectures are probably too restricted for users iteratively testing the design on users.
because they require that the user answer exactly the question that
the system asked which can potentially make a dialogue awkward 4 SYSTEM OVERVIEW
and annoying. As a form-based dialogue system, Atlas has two Atlas is implemented in Python3. Python is a high-level object-
types of prompts: oriented programming language that contains a large number of
• An open prompt which gives the user very few constraints, packages for natural language processing. I decided to use spaCy
allowing the user to respond however they please, as in: How [10] as the NLP processor instead of going with the traditional
may I help you? NLTK [1]. spaCy is written in cython, which is a C extension of
• A directive prompt which explicitly instructs the user how Python designed to give C-like performance to python programs.5
to respond: Please enter yes or no for when Atlas is expecting Figures 2, 3 and 4 compare spaCy’s features, speed, and accuracy,
a yes-no response from the user. respectively, to those of the NLTK and CoreNLP.

From testing Atlas with human users, I realized that users some-
times want to be able to say something that is not exactly the answer
to a single question from the system. For example, users want to
describe the resources or events they are sharing with complex
sentences that may answer more than one of Atlas’ question at a
time.
As shown to be true for Atlas, Jurafsky et al. explain that most
systems avoid the pure system-initiative finite-state approach and
use an architecture that allows mixed initiative, in which conversa-
tional initiative can shift between the system and user at various
points in the dialogue. One common mixed initiative dialogue ar-
chitecture relies on the structure of the frame itself to guide the
dialogue. These frame-based or form-based dialogue managers ask
the user questions to fill slots in the frame, but allow the user to
guide the dialogue by giving information that fills other slots in the
frame. Each slot may be associated with a question to ask the user.
For future work, I plan on improving Atlas’ frame-based dialogue
manager such that it asks questions of the user and fills any slot
that the user specifies, until it has enough information to perform
a database query, and then return the result. If the user happens to
answer two or three questions at a time, the system has to fill in
these slots and then remember not to ask the user the associated Figure 2: spaCy’s features as compared to those of NLTK and
questions for the slots. Not every slot need have an associated ques- CoreNLP
tion, since the dialogue designer may not want the user deluged
with questions. Nonetheless, the system must be able to fill these
slots if the user happens to specify them. This kind of form-filling
dialogue manager thus does away with the strict constraints that
the finite-state manager imposes on the order in which the user
can specify information.
In order for dialogue systems to make sure that they have achieved
the correct interpretation of the user’s input, they can use two meth-
ods: confirming understandings with the user, and rejecting inputs
that the system is likely to have misunderstood. Confirmation is
just one kind of conversational action that a system has to express
lack of understanding, and various strategies can be employed for Figure 3: The speed of tokenizing, tagging and parsing with
confirmation with the user. In its previous versions, Atlas used spaCy as compared to NLTK and CoreNLP
confirmations to confirm the resource that users were willing to
share. However from user testing, it became clear that this has a
PostgreSQL [19] was used as the RDBMS for Atlas. The set of
negative effect on the user experience. Atlas now uses rejections
text search functions and operators introduced with version 9.2
and gives prompts such as "I’m sorry, I didn’t understand that" if it
was most useful in storing the descriptors of resources and events.6
cannot identify the resource being shared.
5 Bansal, Shivam, et al. "Natural Language Processing Made Easy - Using SpaCy (in
As a dialogue system developer, I followed the user-centered
Python)." Analytics Vidhya, 4 Apr. 2017.
design principles of Gould and Lewis (1985) in order to choose the 6 Alger, Abdullah. "Mastering PostgreSQL Tools: Full-Text Search and Phrase Search."
right dialogue strategies, architectures, and prompts, by (i) studying Compose Articles, Compose Articles, 31 July 2017.
5
Figure 5: Atlas Application Flow

Figure 4: The accuracy of entity extraction using spaCy as


compared to using NLTK and CoreNLP or parts-of-speech categories such as nouns, pronouns, verbs and
string/integer variables. Parts of speech are generally represented
by placing the tag after each word, delimited by a slash. Because
By obtaining the semantic vector of all the words contained in tags are generally also applied to punctuation, tokenization7 is
a sentence, Atlas can search for specifically described resources usually performed before, or as part of, the tagging process: sepa-
using certain words in the query. To do this, I use PostgreSQL’s rating commas, quotation marks, and other punctuation from words
two functions: and disambiguating end-of-sentence punctuation (period, question
mark, etc.) from part-of-word punctuation. The input to a tagging
• to_tsvector to create a list of tokens in the tsvector data
algorithm is a sequence of words and a tagset, and the output is
type which returns a vector where every token is a lexeme
a sequence of tags, a single best tag for each word as shown in
(unit of lexical meaning) with pointers (the positions in the
the example: given the input sentence "The ball is red", the output
document), and where words that carry little meaning, such
of the parts-of-speech tagger would be "The/AT ball/NN is/VB
as articles (the) and conjunctions (and, or) are conveniently
red/JJ"[15]. Parts-of-speech tagging is a disambiguation task.
omitted. The ts stands for "text search".
It is crucial for the generated parts-of-speech tags to be accurate
• to_tsquery to query the vector for occurrences of certain
for Atlas to correctly classify the input sentence. Parts-of-speech
words or phrases. to_tsquery accepts a list of words that
are defined based on syntactic and morphological function, group-
will be checked against the normalized vector created. to_tsquery
ing words that have similar neighboring words or that take simi-
also provides a set of operators such as the AND (&), OR (|),
lar affixes. Parts of speech can be divided into two categories: (i)
and NEGATION (!) operators.
closed class types and (ii) open class types. Closed classes are those
In order to communicate with the PostgreSQL database using with relatively fixed membership (such as prepositions) while open
Python, I use Psycopg [4] as the PostgreSQL database adapter for classes are continually being created or borrowed (such as nouns
Python. Psycopg also provides efficient methods to remove secu- and verbs where new ones like iPhone or to fax are constantly
rity bugs that allow SQL injections. This is done by passing the being created). Closed class words are generally function words like
values to be inserted into the database as a sequence to the func- of, it, and, or you, which tend to be very short, occur frequently,
tion cursor.execute() by using the %s placeholders in the SQL and often have structuring uses in grammar. The four major open
statement. classes are:
SQL = "INSERT INTO authors (name) VALUES (%s);" (1) Nouns - include concrete terms, abstractions, as well as verb-
data = ("O'Reilly", ) like terms. The characteristics of a noun that define it in
cur.execute(SQL, data) English relate to the noun’s ability to occur with determiners,
Psycopg converts Python variables to SQL values using their types to take possessives, and to occur in the plural form. Open
and the Python type determines the function used to convert the class nouns fall into two classes: (i) Proper nouns and (ii)
object into a string representation suitable for PostgreSQL. It is Common nouns which are divided into count nouns (which
important to note that for positional variable binding, the second allow grammatical enumeration and can be counted) and
argument must always be a sequence, even if it contains a single mass nouns (which are used when in instances where things
variable. are conceptualized as homogeneous groups).
Figure 5 shows the application flow for Atlas which consists of (2) Verbs - includes most of the words referring to actions and
four phases. Atlas parses sentences that belong to contexts pertain- processes. English verbs have the following inflections: (i)
ing to the querying and adding of information about services (such non-third-person-sg (ii) third-person-sg (iii) progressive (iv)
as tutoring and rides), sharable items (such as books, cars, pens) past participle. Atlas is able to understand mostly non-third-
and events. person-sg and progressive verb inflections.
(3) Adjectives - includes many terms for properties or qualities.
4.1 Parts-of-speech Tagging (4) Adverbs - there are different types of adverbs:
The first phase tags each constituent token of the input sentence • Directional adverbs (locative adverbs) specify the direction
with its respective part-of-speech. Parts-of-speech tagging is the or location of some action.
process of classifying words into one or more of a set of lexical 7 Tokenization is the process of splitting a sentence into its constituent tokens.

6
• Degree adverbs specify the extent of some action, process,
or property.
• Manner adverbs describe the manner temporal of some
action or process.
• Temporal adverbs describe the time that some action or
event took place.
Because of the heterogeneous nature of the Adverb class,
some adverbs (such as temporal adverbs) are tagged in some
tagging schemes as nouns. The existential there is captured
as an adverb by Atlas and can be used to indicate the exis-
tence of a resource of an event.
Atlas further recognizes some of the following important closed
classes in English:
• Prepositions occur before noun phrases, and they often se-
mantically indicate spatial or temporal relations (literal or
metaphorical) but often indicate other relations as well.
• Particles resemble prepositions or adverbs that are used in
combination with verbs. Particles often have extended mean-
ings that are not quite the same as the prepositions they
resemble. When a verb and a particle behave as a single syn- Figure 6: Penn Treebank part-of-speech tags (including
tactic and/or semantic unit, the combination is known as a punctuation)
phrasal verb. Phrasal verbs behave as a semantic unit with
a non-compositional meaning (one that is not predictable
(4) wh-question structure - this structure is identical to the
from the distinct meanings of the verb and the particle e.g.
declarative structure, except that the first noun phrase con-
going on means happening).
tains some wh-word which is usually a wh-pronoun that is
• Determiners fall under the closed class and occur with nouns,
used in question form. In the wh-non-subject question struc-
often marking the beginning of noun phrases. A subtype
ture, the wh-phrase is not the subject of the sentence, and
of determiners is the article. Articles are quite frequent in
so the sentence includes another subject. In these types of
English. There are three articles in the English Language:
sentences the auxiliary appears before the subject NP, just as
a, an, and the. Other determiners include this and that.
in the yes-no-question structures. Constructions like the wh-
A and an mark a noun phrase as indefinite, while the can
non-subject-question contain what are called long-distance
mark it as definite.
dependencies.
Most modern language processing on English uses the 45-tag One challenge of distinguishing between a database query and a
Penn Treebank tagset (Marcus et al., 1993), shown in Figure 6. This database modification is present in the use of sentences containing
tagset has been used to label a wide variety of corpora, including the the subclass of verbs called auxiliaries. Auxiliaries are also known
Brown corpus, the Wall Street Journal corpus, and the Switchboard as helping verbs and have particular syntactic constraints which
corpus. Spacy [10] is used to perform the parts-of-speech tagging can be viewed as a kind of subcategorization. Auxiliary Verbs are a
in Atlas. closed class subtype of English verbs that include the copula verb be,
the two verbs do and have, along with their inflected forms, as well
4.2 Classification as a class of modal verbs. Be is called a copula because it connects
subjects with certain kinds of predicate nominals and adjectives. It is
The second phase uses the tagged sentence to classify the sentence
used as part of the progressive construction which Atlas recognizes.
as either a general sentence, a database query, or a database mod-
Auxiliaries include the modal verbs can, could, may, might, must,
ification. There are a large number of constructions for English
will, would, shall, and should, the perfect auxiliary have, and
sentences, but four are particularly common and important:
the progressive auxiliary be. Each of these verbs places a constraint
(1) declarative structure - these sentences have a subject noun on the form of the following verb, and each of these must also
phrase followed by a verb phrase, like "I prefer a morning combine in a particular order. The modals are used to mark the
flight". mood associated with the event or action depicted by the main
(2) imperative structure - these sentences often begin with a verb. In addition to the perfect have mentioned above, there is a
verb phrase, and have no subject. They are called imperative modal verb have (e.g., I have to go), which Atlas looks for in order
because they are almost always used for commands and to trigger certain responses.
suggestions.
(3) yes-no-question structure - these sentences are often used 4.3 Conversion
to ask questions, and begin with an auxiliary verb, followed After classifying the input sentence correctly, the third phase con-
by a subject NP, followed by a VP. verts the input sentence into a SQL command accordingly. If it is
7
classified as a database query, the SQL statement is a SELECT state- mentions of events in texts and then assign those events into a
ment from the Resources or Events table. If the input sentence is variety of classes.
classified as a database modification, then it is an INSERT statement Identifying the correct resource that is being offered involves
into the Resources or Events table. In this phase, it is very important distinguishing the descriptors of a resource from the actual resource.
that Atlas identifies the correct resource that is being offered or the One challenge of identifying a noun phrase (a group of words
correct date and time of the event that is being registered. which often acts as a single unit) is that they can occur before
Recognizing dates and times is done using a Python module verbs. An English noun phrase can have determiners, numbers,
called datefinder. Atlas is able to recognize the following temporal quantifiers, and adjective phrases preceding the head noun, which
time expressions: can be followed by a number of postmodifiers. The three most
frequent types of noun phrases that occur in English are pronouns,
• Absolute temporal expressions are those that can be mapped proper-nouns and the
directly to calendar dates, times of day, or both.
• Relative temporal expressions map to particular times via N P → DetNominal
some other reference point. construction. These noun phrases consist of a head (the central noun
• Durations denote spans of time at varying levels of granular- in the noun phrase) along with various modifiers that can occur
ity (seconds, minutes, days, weeks, centuries etc.). Durations before or after the head noun. A number of different kinds of word
primarily consist of lengths of time, but may also include in- classes can appear before the head noun (the "postdeterminers") in
formation concerning the start and end points of a duration a nominal. These include cardinal numbers, ordinal numbers, and
when that information is available. quantifiers. Ordinal numbers include first, second, third, and so on,
but also words like next, last, past, other, and another. Adjectives
Jurafsky et al. explain that temporal expressions are syntactic occur after quantifiers but before nouns and can also be grouped
constructions that have temporal lexical triggers as their heads into a phrase called an adjective phrase or AP. A head noun can also
[12]. In the annotation scheme in widest use, lexical triggers can be be followed by postmodifiers. The three common kinds of nominal
nouns, proper nouns, adjectives, and adverbs; full temporal expres- postmodifiers include:
sions consist of their phrasal projections: noun phrases, adjective
(1) prepositional phrases: all flights from Cleveland
phrases and adverbial phrases.
(2) non-finite clauses: any flights arriving after eleven a.m.
The temporal expression recognition task consists of finding the
(3) relative clauses a flight that serves breakfast
start and end of all of the text spans that correspond to such tem-
poral expressions. Although there are myriad ways to compose An alternative approach to head-finding that is used in most practi-
time expressions in English, the set of temporal trigger terms is, for cal computational systems includes identifying heads dynamically
all practical purposes, static, and the set of constructions used to in the context of trees for specific sentences instead of specifying
generate temporal phrases is quite conventionalized. head rules in the grammar itself. Magerman (1995) originally devel-
Temporal normalization refers to the process of mapping a tem- oped a simple set of hand-written rules for finding the head of an
poral expression to either a specific point in time, or to a duration. NP is as follows (Collins 1999, 238):
Points in time correspond either to calendar dates or to times of (1) If the last word is tagged POS, return last-word.
day (or both). (2) Else search from right to left for the first child which is an
Another type of temporal expressions that are rare in real texts NN, NNP, NNPS, NX, POS, or JJR.
are fully qualified temporal expressions which contain a year, month (3) Else search from left to right for the first child which is an
and day in some conventional form. Such temporal expressions NP.
exist in news articles, which are incomplete and are only implicitly (4) Else search from right to left for the first child which is a $,
anchored, often with respect to the dateline of the article, which ADJP, or PRN.
is referred to as the document’s temporal anchor. As immediate (5) Else search from right to left for the first child which is a CD.
future work, I will implement a combination of Regular Expressions (6) Else search from right to left for the first child which is a JJ,
and datefinder in order to compute the values of relatively simple JJS, RB or QP.
temporal expressions such as today, yesterday, or tomorrow with (7) Else return the last word.
respect to the temporal anchor, which will be the current time of There are also word classes that modify and appear before NPs, and
use of Atlas. The semantic procedure for today simply assigns the are called predeterminers (e.g. all). Many of these have to do with
anchor, while the attachments for tomorrow and yesterday add a day number or amount.
and subtract a day from the anchor, respectively. Given the circular In the context of modifying the database with a natural language
nature of representations for months, weeks, days and times of day, sentence, Atlas defines strict rules on the sentence structure in
the temporal arithmetic procedures must use modulo arithmetic order to ensure that the resource or event and its descriptors are
appropriate to the time unit being used. Unfortunately, even simple correctly and accurately identified. Some of these rules include:
expressions such as the weekend or Wednesday introduce a fair • Ensuring that what comes after the database-modifying verb
amount of complexity. Relative temporal expressions are handled has a part-of-speech tag that is one of the following: deter-
with temporal arithmetic similar to that used for today and yes- miner, adjective, plural noun, singular noun, gerund (a verb
terday. As a longer-term future implementation, I will implement in its present participle form with "ing" that functions as a
the task of event detection and classification in order to identify noun), pronoun, or cardinal number.
8
• Ensuring that the actual resource has a part-of-speech tag Recall is the percentage of relevant instances that have been re-
that is a noun or gerund. trieved over the total amount of relevant instances. Recall is there-
fore also known as sensitivity.
4.4 Retrieval TruePositives
Recall =
The final phase involves returning the results of running the SQL TruePositives + FalseN eдatives
command. If the SQL command is a query (on either the Resources A false positive is a type I error that is the incorrect rejection of a
or the Events tables) and there are entries in the database, then true null hypothesis, while a false negative is a type II error that
a table of the results is returned. However, if there are no entries is incorrectly retaining a false null hypothesis.8 F1-measure is the
in the database for that specific query, then a follow up message harmonic average of the precision and recall, where an F1 measure
is returned. If the SQL command is a modification (to either the reaches its best value at 1 (perfect precision and recall, hence F1)
Resources or the Events tables) then a simple confirmation is re- and worst at 0. The F1-measure is given by:
turned. 2 × Precision × Recall
F=
Precision + Recall
5 RESULTS AND DISCUSSION The average recall was 86.0% and the average precision was
The challenges of natural language ambiguities are resolved in the 81.0%. These two metrics contributed to an average F1-measure of
second phase of the program, classification. This is a crucial phase 83.0%.
that largely determines if the correct SQL statement is generated
from the natural language sentence inputted. Therefore, I conducted
an experiment to test the accuracy (the percentage of all classifi-
cations where Atlas and the gold standard agree) of the system as
well as the F1-measure of each dialogue tree’s classification.
The demographics of the test user group included a multicultural
student body, ages 18-24, who attended a small liberal arts college
called Earlham College in Richmond, Indiana. An iOS mobile appli-
cation connected to a Python3 Flask server was used as the user Figure 8: The precision, recall and F1-measure of each dia-
interface to test the proposed system (see Acknowledgements). logue tree.
A test dataset was produced based on the results of user testing.
It contained labeled lines of input strings from the test users. A test
script was written in Python to input the sentences one after the 5.1 Test Cases
other to Atlas and compare Atlas’ returned response (which is a
The test sentences were categorized according to the 4 types of
simple string classification of the input sentence) with a human-
sentence-level constructions discussed previously under section 4.
labeled gold standard test set. Based on the results of the test, I
The following examples of each of the sentence types were tested:
evaluated the classification of the system in general as well as the
binary classification of each dialogue tree. • "I have a car" - this is a type of declarative sentence,
The accuracy of the system based on the results of the testing where Atlas recognizes "have" as one of the modifying verbs
was 81.0%. The following confusion matrix was produced based on that trigger the modifyResource dialogue which registers a
the results: car as a resource in the Resources table.
• "Find rides to Walmart" - this is a type of an impera-
tive sentence, where the querying verb "find" triggers the
queryResource which then proceeds to query the resources
table for rides.
• "Are there Dell laptops on sale?" - despite the name,
yes-no questions return the same type of results as any other
querying question. This is because yes-no questions contain
a question mark which triggers either the queryResource
Figure 7: Confusion Matrix representing the results of the or queryEvent dialogues depending on the trigger words
experiment. in the sentence. In this sentence, the question mark is used
to categorize it as a general query. Atlas then proceeds to
identify the actual resource that is being queried (laptop)
Furthermore, the precision, recall, and F1-measure of each di- and its descriptors (Dell).
alogue tree was calculated. These metrics were used to evaluate • "What is happening on Sunday?" - just like yes-no ques-
the binary classification of each dialogue tree. Precision is the per- tions, wh-questions contain a question mark which triggers
centage of relevant instances among the retrieved instances and is either the queryResource or queryEvent dialogues depend-
therefore known as positive predictive value. ing on the trigger words in the sentence. In this sentence,
the question mark is used to categorize it as a general query
TruePositives
Precision = 8 "Type
TruePositives + FalsePositives I and Type II Errors." Wikipedia, Wikimedia Foundation, 10 Apr. 2018.
9
and then the trigger word "happening" is used to trigger the [4] F Di Gregorio. 2010. Psycopg: PostgreSQL database adapter for Python. (2010).
queryEvent dialogue. [5] Oren Etzioni, Michael Cafarella, Doug Downey, Stanley Kok, Ana-Maria Popescu,
Tal Shaked, Stephen Soderland, Daniel S Weld, and Alexander Yates. 2004. Web-
In all of these cases, once the correct dialogue captures the sen- scale information extraction in knowitall:(preliminary results). In Proceedings of
the 13th international conference on World Wide Web. ACM, 100–110.
tence, the sentence is then passed on to the next stage of the pro- [6] David A. Ferrucci. 2011. IBM’s Watson/DeepQA. SIGARCH Comput. Archit. News
gram where the actual resource being offered or the correct date and 39, 3 (June 2011), –. https://doi.org/10.1145/2024723.2019525
time are identified. A Python dictionary is used to contain all the [7] Michael Galeso. 2017. Apple Siri for Mac: An Easy Guide to the Best Features.
CreateSpace Independent Publishing Platform, USA.
dialogues between Atlas and each user. Each Atlas-user dialogue [8] Yoav Goldberg and Omer Levy. 2014. word2vec explained: Deriving mikolov et
has a key which is the user’s id, while the value is a list containing al.’s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722
the necessary information for Atlas to maintain the dialogue. (2014).
[9] Marti A Hearst. 1992. Automatic acquisition of hyponyms from large text cor-
pora. In Proceedings of the 14th conference on Computational linguistics-Volume 2.
6 CONCLUSION AND FUTURE WORK Association for Computational Linguistics, 539–545.
[10] Matthew Honnibal and Mark Johnson. 2015. An Improved Non-monotonic Tran-
In this paper, I proposed a dialogue-driven chatbot called Atlas, sition System for Dependency Parsing. In Proceedings of the 2015 Conference on
which is based on a NLIDB and whose purpose is to democratize Empirical Methods in Natural Language Processing. Association for Computational
Linguistics, Lisbon, Portugal, 1373–1378. https://aclweb.org/anthology/D/D15/
SQL and ensure that all members of a community are easily able to D15-1162
access and contribute to a relational database. Atlas acts and thinks [11] Wolfram Research, Inc. [n. d.]. SystemModeler, Version 5.0. ([n. d.]). Champaign,
humanly, and allows users to add to and query a SQL database IL, 2017.
[12] Dan Jurafsky and James H Martin. 2014. Speech and language processing. Vol. 3.
containing information about the resources that are available within Pearson London:.
a community, thereby resulting in a more equitable distribution [13] Esther Kaufmann, Abraham Bernstein, and Renato Zumstein. 2006. Querix: A
natural language interface to query ontologies based on clarification dialogs. In
and a sustainable use of the community’s resources. The results of 5th International Semantic Web Conference (ISWC 2006). Springer, 980–981.
testing Atlas at Earlham College with a multicultural student body [14] Dan Klein and Christopher D Manning. 2003. Accurate unlexicalized parsing.
showed that Atlas is able to correctly classify a significant portion In Proceedings of the 41st annual meeting of the association for computational
linguistics.
of the test cases, achieving an accuracy of 81.0% and an average [15] Nitin Madnani. 2007. Getting started on natural language processing with Python.
F1-measure of 83.0% for the dialogue trees. Crossroads 13, 4 (2007), 5–5.
As future work, I will test how much Machine Learning tools [16] Kavi Mahesh and Sergei Nirenburg. 1996. Knowledge-based systems for natu-
ral language processing. New Mexico State University, Computing Research
such as Word2vec [8] can improve the accuracy of the system. Laboratory.
Word2vec provides an efficient implementation of the continuous [17] Ana-Maria Popescu, Alex Armanasu, Oren Etzioni, David Ko, and Alexander
Yates. 2004. Modern natural language interfaces to databases: Composing statis-
bag-of-words and skip-gram architectures for computing vector tical parsing with semantic tractability. In Proceedings of the 20th international
representations of words, and hence it can be used to associate conference on Computational Linguistics. Association for Computational Linguis-
synonyms together, such as car and ride. Furthermore, in order to tics, 141.
[18] Silvia Quarteroni and Suresh Manandhar. 2007. A chatbot-based interactive
capture complex compound sentences, parsing the input sentence question answering system. Decalog 2007 (2007), 83.
and producing a Parse Tree would generate more accurate results. [19] R Development Core Team. 2008. R: A Language and Environment for Statistical
The Stanford Parser’s lexicalized PCFG parser [14] can be used to Computing. R Foundation for Statistical Computing, Vienna, Austria. http:
//www.R-project.org ISBN 3-900051-07-0.
determine the grammatical structure of a sentence. For example, [20] Filbert Reinaldha and Tricya E Widagdo. 2014. Natural language interfaces to
groups of words that go together (as "phrases") as well as words database (nlidb): Question handling and unit conversion. In Data and Software
Engineering (ICODSE), 2014 International Conference on. IEEE, 1–6.
that are the subject or object of a verb can be determined using the [21] Stuart J Russell and Peter Norvig. 2016. Artificial intelligence: a modern approach.
probabilistic parser. Malaysia; Pearson Education Limited,.
[22] Iulian V Serban, Chinnadhurai Sankar, Mathieu Germain, Saizheng Zhang,
Zhouhan Lin, Sandeep Subramanian, Taesup Kim, Michael Pieper, Sarath Chan-
7 ACKNOWLEDGEMENTS dar, Nan Rosemary Ke, et al. 2017. A deep reinforcement learning chatbot. arXiv
This work was made possible through the guidance and support of preprint arXiv:1709.02349 (2017).
[23] Peter Turney. 2001. Mining the web for synonyms: PMI-IR versus LSA on TOEFL.
my senior capstone advisor/mentor, David Barbella. I would like Machine Learning: ECML 2001 (2001), 491–502.
to thank the NadaBot team for helping in the user testing phase of [24] Victor Zhong, Caiming Xiong, and Richard Socher. 2017. Seq2SQL: Generating
Structured Queries from Natural Language using Reinforcement Learning. arXiv
the proposed system. Specifically, I want to extend my gratitude to preprint arXiv:1709.00103 (2017).
James Tran ’18, for building the iOS mobile application, and Lam
Nguyen ’18, for building the Python Flask server. The iOS applica-
tion combined with the server allowed for a user interface that was
used by students at Earlham College to test the proposed system.
Finally, I would like to thank the Computer Science department at
Earlham for their support, and the Earlham community in general.

REFERENCES
[1] Steven Bird. 2006. NLTK: the natural language toolkit. In Proceedings of the
COLING/ACL on Interactive presentation sessions. Association for Computational
Linguistics, 69–72.
[2] Eric Brill. 1994. Some advances in transformation-based part of speech tagging.
arXiv preprint cmp-lg/9406010 (1994).
[3] Nancy Chinchor and Elaine Marsh. 1998. Muc-7 information extraction task
definition. In Proceeding of the seventh message understanding conference (MUC-7),
Appendices. 359–367.
10

Das könnte Ihnen auch gefallen