PDF Paragraph

Semantics of Paragraphs
Wl o d e k Z a d r o z n y
Ka r e n J e n s e w
IBM T. J. Watson Research Center
We present a computational theory of the paragraph. Within it we formally define coherence,
give semantics to the adversative conjunction "but" and to the Gricean maxim of quantity,
and present some new methods for anaphora resolution.
The theory precisely characterizes the relationship between the content of the paragraph
and background knowledge needed for its understanding. This is achieved by introducing a
new type of logical theory consisting of an object level, corresponding to the content of the
paragraph, a referential level, which is a new logical level encoding background knowledge,
and a metalevel containing constraints on models of discourse (e.g. a formal version of Gricean
maxims). We propose also specific mechanisms of interaction between these levels, resembling
both classical provability and abduction. Paragraphs are then represented by a class of structures
called p-models.
1. I nt r od u c t i on
Logic and knowl edge have been often di scussed by linguists. Anaphor a is anot her
promi nent subject in linguistic analyses. Not so frequent l y exami ned are different t ypes
of cohesion. And it is quite rare to find the wor d "paragraph" in articles or books
about natural l anguage underst andi ng, al t hough paragraphs are grammat i cal units
and units of discourse. But it is possible to speak formally about the role of background
knowl edge, cohesion, coherence and anaphor a- - al l wi t hi n one, flexible and natural,
logical syst em- - i f one examines the semantic role of the linguistic construct called a
paragraph.
Paragraphs have been somet i mes described, rather loosely, as "units of t hought . "
We establish a correspondence bet ween t hem and certain t ypes of logical models,
t hereby maki ng the characterization of paragraphs more precise. The correspondence
gives us also an oppor t uni t y to identify and at t ack- - wi t h some success, we bel i eve- -
three interesting and i mport ant problems: (1) how to define formal l y coherence and
topic, (2) how to resolve anaphora, and (3) what is the formal meani ng of linkages (con-
nectives) such as but, however, and, certainly, usually, because, then, etc. These quest i ons
are central from our poi nt of vi ew because: (1) the "uni t y" of a paragraph stems from
its coherence, while the "about ness" of t hought can be, at least to some extent, de-
scribed as existence of a topic; (2) wi t hout det ermi ni ng reference of pr onouns and
phrases, the uni verses of the model s are undefi ned; and (3) the linkages, whi ch make
sentences into paragraphs, have semantic roles that must be account ed for. We can ex-
plain then the process of bui l di ng a comput at i onal model of a paragraph (a p-model) as
an interaction bet ween its sentences, background knowl edge to whi ch these sentences
refer, and metatheoretical operat ors that indicate t ypes of permi t t ed models.
P.O. Box 704, Yorktown Heights, NY 10598
(~ 1991 Association for Computational Linguistics
Computational Linguistics Volume 17, Number 2
At this poi nt the reader may ask: what is so special about paragraphs; does all
this mean that a chapter, a book or a letter do not have any formal count erpart s? We
believe t hey do. But we si mpl y do not yet know how correspondi ng formal structures
woul d be created from model s of paragraphs. ~Ib answer this quest i on we may need
more advanced theories of categorization and learning t han exist today. On the other
hand, the paragraph is the right place to begin: it is the next grammat i cal uni t after
the sentence; connectives provi di ng cohesion operat e here, not at the level of an entire
discourse; and it is the smallest reasonable domai n of anaphora resolution. Further-
more, we will argue, it is the smallest domai n in whi ch topic and coherence can be
defined.
The formalization of paragraph structure requires the i nt roduct i on of a new t ype
of logical t heory and a correspondi ng class of models. As we know, the usual log-
ical structures consist of an object level t heory T and provabi l i t y relation F-; wi t hi n
the context of the semantics of nat ural language, the object t heory contains a logical
translation of the surface form of sentences, and F- is the st andard provabi l i t y relation
(logical consequence). In mat hemat i cal logic, this scheme is somet i mes ext ended by
addi ng a met al evel assumpt i on, for instance post ul at i ng the st andardness of nat ural
numbers. In artificial intelligence, a met arul e typically, the cl osed wor l d assumpt i on
of circumscription---can be used in dealing wi t h theoretical questions, like the frame
problem. But a formal account of nat ural l anguage under st andi ng requires more. It
requires at least a descri pt i on (a) of how backgr ound knowl edge about objects and
relations that the sentences describe is used in the process of underst andi ng, and (b) of
general constraints on linguistic communi cat i ons, as expressed for instance in Gricean
maxims. It is wel l known that wi t hout the former it is i mpossi bl e to find references of
pr onouns or at t achment s of preposi t i onal phrases; backgr ound knowl edge, as it t urns
out, is also i ndi spensabl e in establishing coherence. We have t hen reasons for introduc-
ing a new logical l evel - - a referential level R, whi ch codes the backgr ound knowl edge.
As for Gricean maxims, we show that t hey can be expressed formal l y and can be used
in a comput at i onal model of communi cat i on. We i ncl ude t hem in a metalevel M, whi ch
contains global constraints on model s of a text and definitions of met a-operat ors such
as the conjunction but. We end up wi t h three-level logical theories (M, T, R, ~-R + M),
wher e a provabi l i t y relation ~-~ + M, based on R and M, can be used, for example, to
establish the reference of pronouns.
This wor k is addr essed pri mari l y to our colleagues wor ki ng on comput at i onal
model s of nat ural language; but it shoul d be also of interest to linguists, logicians,
and phi l osophers. It shoul d be of interest to linguists because the not i on that a para-
graph is equi val ent to a model is somet hi ng concrete to discuss; because p- model s
are as formal as formal l anguages (and therefore somet hi ng satisfyingly theoretical
to argue about); and because new directions for analysis are opened beyond the sen-
tence. The wor k shoul d be of interest to logicians because it i nt roduces a new t ype of
three-level theory, and correspondi ng models. The t heory of these structures, whi ch
are based on linguistic constructs, will differ from classical model t heor y- - f or instance,
by the fact that names of predi cat es of an object t heory matter, because t hey connect
the object t heory wi t h the referential level. This wor k shoul d be of interest to philoso-
phers for many of the same reasons: it makes more sense to talk about the meani ng
of a paragraph t han about the meani ng of a sentence. The fol l owi ng parallel can be
drawn: a sentence is meani ngful onl y wi t h respect to a model of a paragraph, exactly
as the t rut h val ue of a formul a can be comput ed onl y wi t h respect to a given model.
Moreover, it is possi bl e in this f r amewor k to talk about meani ng wi t hout ment i oni ng
the idea of possi bl e worl ds. However , we do not identify meani ng wi t h truth condi-
tions; in this paper, the meani ng of a sentence is its role in the model of the par agr aph
172
Zadrozny and Jensen Semantics of Paragraphs
in whi ch this sentence occurs. Our intuitive concept of meani ng is similar to Lakoff' s
(1987) Idealized Cognitive Model (ICM). Needl ess to say, we believe in the possibil-
ity of formalizing ICMs, al t hough in this paper we will not try to express, in logic,
pr ot ot ype effects, met aphors, or formal links wi t h vision.
The paper is present ed in six sections. In Section 2, we discuss the grammatical
function of the paragraph and we show, informally, how a formal model of a para-
graph mi ght actually be built. In Section 3 we give the logical preliminaries to our
analysis. We discuss a three-part logical structure that includes a referential level, and
we i nt roduce a model for pl ausi bl e meaning. Section 4 di scusses paragraph coherence,
and Section 5 constructs a model of a paragraph, a p-model, based on the information
contained in the paragraph itself and background information contained in the ref-
erential level R. Section 5 furt her mot i vat es the use of the referential level, showi ng
how it cont ri but es to the resolution of anaphori c reference. In Section 6, we br oaden
our presentation of the metalevel, i nt roduci ng some metalevel axioms, and sketching
ways by whi ch t hey can be used to reduce ambi gui t y and construct new models. We
also show metalevel rules for interpreting "but. "
2. The Paragraph as a Discourse Uni t
2.1 Approaches to Paragraph Anal ysi s
Recent syntactic t heor y- - t hat is, in the last 30 year s- - has been preoccupi ed wi t h
sentence-level analysis. Within di scourse theory, however, some significant wor k has
been done on the analysis of wri t t en paragraphs. We can identify four different linguis-
tic approaches to paragraphs: prescriptivist, psycholinguist, textualist, and discourse-
oriented.
The prescriptivist approach is typified in st andard English gr ammar textbooks,
such as Warriner (1963). In these sources, a paragraph is notionally defi ned as some-
thing like a series of sentences that devel op one single topic, and rules are laid down
for the construction of an ideal (or at least an acceptable) paragraph. Al t hough these
dictates are fairly clear, the underl yi ng not i on of topic is not.
An exampl e of psycholinguistically oriented research wor k can be f ound in Bond
and Hayes (1983). These aut hors take the position that a paragraph is a psychologically
real uni t of discourse, and, in fact, a formal grammatical unit. Bond and Hayes f ound
three major formal devices that are used, by readers, to identify a paragraph: (1) the
repetition of content wor ds (nouns, verbs, adjectives, adverbs); (2) pr onoun reference;
and (3) paragraph length, as det ermi ned by spatial a nd/ or sentence-count information.
Other psycholinguistic st udi es that confirm the validity of paragraph units can be
f ound in Black and Bower (1979) and Haber l andt et al. (1980).
The textualist approach to paragraph analysis is exemplified by E. J. Crothers. His
wor k is taxonomic, in that he performs detailed descriptive analyses of paragraphs. He
lists, classifies, and di scusses vari ous t ypes of inference, by whi ch he means, generally,
"the linguistic-logical notions of consequent and presupposi t i on" Crothers (1979:112)
have collected convincing evi dence of the existence of l anguage chunks- - r eal struc-
tures, not just ort hographi c convent i ons- - t hat are smaller than a discourse, larger than
a sentence, generally composed of sentences, and recursive in nat ure (like sentences).
These chunks are somet i mes called "epi sodes, " and somet i mes "paragraphs. " Accord-
ing to Hi nds (1979), paragraphs are made up of segments, whi ch in t urn are made up of
sentences or clauses, whi ch in t urn are made up of phrases. Paragraphs therefore give
hierarchical structure to sentences. Hi nds discusses three major t ypes of paragraphs,
and their correspondi ng segment types. The three t ypes are procedural (how-to), ex-
posi t ory (essay), and narrative (in this case, spont aneous conversation). For each type,
173
its segments are di st i ngui shed by bearing distinct relationships to the paragraph topic
(which is central, but nowhere clearly defined). Segments themselves are composed
of clauses and regulated by "switching" patterns, such as the question-answer pat t ern
and the remark-reply pattern.
2.2 Our Vi ew of Paragraphs: An I nformal Sketch
Al t hough there are other discussions of the paragraph as a central element of discourse
(e.g. Chafe 1979, Hal l i day and Hasan 1976, Longacre 1979, Haberl andt et al. 1980), all
of t hem share a certain limitation in their formal techniques for anal yzi ng paragraph
structure. Discourse linguists show little interest in maki ng the structural descriptions
precise enough so t hat a comput at i onal grammar of text could adapt t hem and use
them. Our interest, however, lies precisely in that area.
We suggest that the paragraph is a grammatical and logical unit. It is the small-
est linguistic representation of what , in logic, is called a "model , " and it is the first
reasonable domai n of anaphora resolution, and of coherent t hought about a central
topic.
A paragraph can be t hought of as a grammat i cal uni t in the following sense: it
is the discourse uni t in whi ch a functional (or a predicate-argument) structure can be
definitely assigned to sent ences/ st ri ngs. For instance, Sells (1985, p. 8) says t hat the
sentence "Reagan thinks bananas, " whi ch is otherwise strange, is in fact acceptable if
it occurs as an answer to the question "What is Kissinger' s favorite fruit?" The pairing
of these two sentences may be said to create a small paragraph. Our point is that an
acceptable structure can be assigned to the utterance "Reagan thinks bananas" onl y
wi t hi n the paragraph in whi ch this utterance occurs. We believe that, in general, no uni t
larger t han a paragraph is necessary to assign a functional structure to a sentence, and
t hat no smaller discourse fragment, such as two (or one) neighboring sentences, will
be sufficient for this task. That is, we can ask in the first sentence of a paragraph about
Kissinger' s favorite fruit, elaborate the question and the circumstances in the next few
sentences, and give the above answer at the end. We do not claim t hat a paragraph is
necessarily described by a set of gr ammar rules in some grammar formalism (although
it may be); rather, it has the grammatical role of provi di ng functional structures t hat
can be assigned to strings.
The logical structure of paragraphs will be anal yzed in the next sections. At this
point we woul d like to present some intuitions that led to this analysis. But first we
want to identify our point of departure. In order to resolve anaphora and to establish
the coherence or incoherence of a text, one must appeal to the necessary background
knowledge. Hence, a formal analysis of paragraphs must include a formal description
of background knowl edge and its usage. Furthermore, this background knowl edge
cannot be treated as a collection of facts or formulas in some formal language, because
t hat woul d preclude dealing wi t h contradictory wor d senses, or multiple meanings.
Secondly, this background knowl edge is not infinite and esoteric. In fact, to a large ex-
tent it can be found in st andard reference works such as dictionaries and encyclopedias.
To argue for these points, we can consider the following paragraph: 1
In the summer of 1347 a merchant ship returning from the Black Sea entered the
Sicilian port of Messina bringing with it the horrifying disease that came to be
known as the Black Death. It struck rapidly. Within twenty-four hours of
infection and the appearance of the first small black pustule came an agonizing
death. The effect of the Black Death was appalling. In less than twenty years half
1 J. Burke, The Day the Universe Changed. 1986. Little, Brown & Co., Boston, Massachusetts, p. 55.
174
the population of Europe had been killed, the countryside devastated, and a
period of optimism and growing economic welfare had been brought to a
sudden and catastrophic end.
The sentences that compose a paragraph must stick together; to put it more tech-
nically, t hey must cohere. This means ver y often that t hey show cohesion in the sense
of Hal l i day (1976)--semantic links bet ween elements. Crucially, also, the sentences of
a paragraph must all be related to a topic.
However, in the exampl e paragraph, ver y f ew instances can be found here of the
formal grammat i cal devices for paragraph cohesion. There are no connectives, and
there are onl y t wo anaphori c pr onouns (both occurrences of "it'0. In each case, there
are multiple possible referents for the pronoun. The paragraph is coherent because it
has a topic: "Black Death"; all sentences ment i on it, explicitly or implicitly.
Not i ce that resolving anaphora precedes the di scovery of a topic. A f ew wor ds
about this will illustrate the usage of background knowl edge. By parsi ng wi t h syntactic
information alone, we show that resolution of the first "it" reference hinges on the
proper at t achment of the participial clause "bringing wi t hi n it. . . ". If the "bringing"
clause modifies "Messina," then "Messina" is the subject of ' `bringing" and must be
the referent for "it." If the clause modifies "port," t hen "port " is the desi red referent;
if the clause is attached at the level of the mai n verb of the sentence, then "ship" is
the referent.
But syntactic relations do not suffice to resolve anaphora: Hobbs' (1976) algorithm
for resolving the reference of pronouns, dependi ng onl y on the surface syntax of
sentences in the text, when appl i ed to "it" in the exampl e paragraph, fails in bot h
cases to identify the most likely referent NP.
Addi ng selectional restrictions (semantic feature information, Hobbs 1977) does
not solve the probl em, because isolated features offer onl y part of the backgr ound
knowl edge necessary for reference disambiguation. Later, Hobbs (1979, 1982) pr oposed
a knowl edge base in whi ch information about l anguage and the wor l d woul d be
encoded, and he emphasi zed the need for usi ng "salience" in choosi ng facts from this
knowl edge base.
We will investigate the possibility that the structure of this knowl edge base can
actually resemble the structure of, for example, natural l anguage dictionaries. The
process of finding referents coul d then be aut omat ed.
Det ermi ni ng that the most likely subject for "bringing," in the first sentence, is the
noun "ship" is done in the fol l owi ng fashion. The first definition for "bring" in W7
(Webster's Seventh Collegiate Dictionary) is "to convey, lead, carry, or cause to come along
wi t h one. . . " The available possible subjects for "bringing" are "Messina," "port," and
"ship." "Messina" is listed in the Pronounci ng Gazetteer of W7, whi ch means that it
is a place (and is so identified in the subtitle of the Gazetteer). So we can subst i t ut e
the wor d "place" for the wor d "Messina." Then we check the given definitions for the
wor ds "place," "port," and "ship" in bot h dictionaries. LDOCE (Longman Dictionary of
Contemporary English) proves particularly useful at this point. Definitions for "place"
begin: "a particular part of space. . . ". Definitions for "port " include: "har bour . . . "; "an
openi ng in the side of a shi p. . . ". But the first ent ry for "ship" in LDOCE reads "a large
boat for carrying peopl e or goods. . . ". This demonst rat es a very quick connection wi t h
the definition for the verb "bring," since the wor d "carry" occurs in bot h definitions.
It requires much more time and effort to find a connection bet ween "bring" and either
of the other t wo candi dat e subject wor ds "place" or "port." Similar techniques can be
used to assign "disease" as the most probabl e referent for the second "it" anaphor in
our exampl e paragraph.
175
Equally significant in this instance is the realization that a dictionary points to
synonym and paraphrase relations, and thereby verifies the cohesiveness of the pas-
sage. Through the dictionary (LDOCE again), we establish shared-word relationships
between and among the words "disease," "Black death, " "infection," "deat h, " "killed,"
and "end. " Note that there is no other means, short of appealing to human under-
st andi ng or to some hand-coded body of predicate assertions, for maki ng these rela-
tionships.
This demonst rat es that i nformat i on needed to i dent i fy and resolve anaphoric ref-
erences can be found, to an interesting extent, in st andard dictionaries and thesauri.
(Other reference works could be treated as addi t i onal sources of worl d knowledge.)
This t ype of consultation uses existing nat ural l anguage texts as a r e f e r e n t i a l l e v e l for
processing purposes. It is the lack of exactly this not i on of referential level that has
stood in the way of other linguists who have been interested in the paragraph as a
unit. Crothers (1979, p. 112), for example, bemoans the fact t hat his "t heory lacks a
worl d knowl edge component, a ment al ' encyclopedia,' whi ch could be i nvoked to gen-
erate inferences. . . ". With respect to that i ndependent source of knowl edge, our mai n
contributions are two. First, we i dent i fy its possible structure (a collection of partially
ordered theories) and make formal the choice of a most plausible interpretation. In
other words, we recognize it as a separate logical l evel --t he referential level. Second,
we suggest that nat ural l anguage reference works, like dictionaries and thesauri, can
quite often fill the role of the referential level.
3. The Logic of Reference
The goal of this section is to introduce a formalism describing how background knowl-
edge is used in under st andi ng text. The term "logic of reference" denotes a formal
description of this process of consulting various sources of i nformat i on in order to
produce an interpretation of a text. The formalist will be presented in a number of
steps in whi ch we will elaborate one simple example:
Example 1
E nt e r i ng t he port , a shi p br ought a disease.
This sentence can be translated into the logical formula (ignoring onl y the past
tense of "bring" and the progressive of "enter' 9:
Defi ni t i on S: ent er( xl ~ x2) & s hi p( x 1) & por t ( x 2) & bri ng(x3~ x4) & di sease( x4) & x l = s
& x 2 = m & x3 = s & x 4 = d,
where s, m, d, are constants.
We adopt the three-level semantics as a formal tool for the analysis of paragraphs.
This semantics was constructed (Zadrozny 1987a, 1987b) as a formal framework for
default and commonsense reasoning. It shoul d not come as a surprise t hat we can
now use this apparat us for t ext / di scourse analysis; after all, many nat ural l anguage
inferences are based on defaults, and quite often t hey can be reduced to choosing most
plausible interpretations of predicates. For instance, relating "t hey" to "apples" in the
sentence (cf. Haugel and 1985 p. 195; Zadr ozny 1987a):
We bought t he boys apples because t hey were so cheap
176
can be an exampl e of such a mos t pl ausi bl e choice.
The mai n i deas of t he t hr ee- l evel semant i cs can be st at ed as fol l ows:
1. Reas oni ng t akes pl ace i n a t hr ee- l evel st r uct ur e consi st i ng of an obj ect
l evel , a ref erent i al l evel , and a met al evel .
2. The object level is us ed to descr i be t he cur r ent si t uat i on, and in our case
is r es er ved f or t he f or mal r epr es ent at i on of pa r a gr a ph sent ences. For t he
sake of si mpl i ci t y, t he obj ect l evel wi l l consi st of a first or der t heory.
3. The referential level, de not e d by R, consi st s of t heor i es r epr es ent i ng
ba c kgr ound knowl edge, f r om whi ch i nf or mat i on r el evant to t he
unde r s t a ndi ng of a gi ven pi ece of t ext has t o be ext r act ed. It const r ai ns
i nt er pr et at i ons of t he pr edi cat es of an obj ect t heory. Its st r uct ur e and t he
ext r act i on me t hods wi l l be di scussed bel ow.
4. Unde r s t a ndi ng has as its goal const r uct i on of an i nt er pr et at i on of t he
t ext , i.e. bui l di ng s ome ki nd of model . Si nce not all l ogi cal l y per mi ssi bl e
model s are l i ngui st i cal l y appr opr i at e, one needs a pl ace, na me l y t he
metalevel, t o p u t const r ai nt s on t ypes of model s. Gr i cean maxi ms bel ong
t here; Sect i on 6 wi l l be de vot e d t o a pr es ent at i on of t he met al evel rul es
cor r es pondi ng t o t hem.
We have s hown el s ewher e (Jensen a nd Bi not 1988; Za d r o z n y 1987a, 1987b) t hat nat ur al
l anguage pr ogr ams , such as on- l i ne gr a mma r s and di ct i onari es, can be us ed as r ef er en-
tial l evel s f or c ommons e ns e r e a s oni ng- - f or exampl e, t o di s ambi guat e PP at t achment .
Thi s means t hat i nf or mat i on cont ai ned in gr a mma r s and di ct i onar i es can be us ed t o
const r ai n possi bl e i nt er pr et at i ons of t he logical pr edi cat es of an obj ect -l evel t heory.
The r ef er ent i al st r uct ur es we are goi ng t o use are col l ect i ons of l ogi cal t heori es,
but t he concept of r ef er ence is mor e general . Some of t he i nt ui t i ons we associ at e wi t h
this not i on have been ve r y wel l expr es s ed by Tur ner (1987, pp. 7-8):
. . . Semantics is constrained by our models of ourselves and our worlds. We have
models of up and down that are based by the way our bodies actually function.
Once the word "up" is given its meaning relative to our experience with gravity,
it is not free to "slip" into its opposite. "Up" means up and not down . . . . We
have a model that men and women couple to produce offspring who are similar
to their parents, and this model is grounded in genetics, and the semantics of
kinship metaphor is grounded in this model. Mothers have a different role than
fathers in this model, and thus there is a reason why "Death is the father of
beauty" fails poetically while "Death is the mother of beauty" succeeds . . . .
It is pr eci sel y this " gr oundi ng" of l ogi cal pr edi cat es i n ot her concept ual st r uct ur es
t hat we woul d like t o capt ur e. We i nvest i gat e her e onl y t he " gr oundi ng" in l ogi cal t he-
ories. Howe ve r , it is possi bl e t o t hi nk about const r ai ni ng l i ngui st i c or l ogi cal pr edi cat es
by si mul at i ng physi cal exper i ences (cf. Woods 1987).
We as s ume her e t hat a t r ansl at i on of t he sur f ace f or ms of sent ences i nt o a l ogi cal
f or mal i s m is possi bl e. Its det ai l s are not i mpor t a nt f or our ai m of gi vi ng a semant i c
i nt er pr et at i on of par agr aphs ; t he mai n t heses of our t heor y do not de pe nd on a l ogi cal
not at i on. So we wi l l us e a ve r y si mpl e f or mal i sm, like t he one above, r es embl i ng t he
s t andar d first or der l anguage. But, obvi ousl y, t her e are ot her possi bi l i t i es- - f or i nst ance,
t he di scour se r epr es ent at i on st r uct ur es (DRS' s) of Ka mp (1981), whi ch have been us ed
t o t r ansl at e a subset of Engl i sh i nt o l ogi cal f or mul as, t o mode l t ext (i dent i fi ed wi t h a list
of sent ences), t o anal yze a f r agment of Engl i sh, and t o deal wi t h anaphor a. The l ogi cal
177
notation of Mont ague (1970) is more sophisticated, and may be considered another pos-
sibility. Jackendoff' s (1983) formalism is richer and resembles more closely an English
grammar. Jackendoff (1983, p. 14) writes "it woul d be perverse not to take as a worki ng
assumpt i on that language is a relatively efficient and accurate encoding of the infor-
mat i on it conveys." Therefore a formalism of the ki nd he advocates woul d probably be
most suitable for an i mpl ement at i on of our semantics. It will also be a model for our
simplified logical notation (cf. Section 5). We can envision a syst em that uses dat a struc-
tures produced by a comput at i onal grammar to obtain the logical form of sentences.
3.1 Finite Representations, Finite Theori es
Unless explicitly stated otherwise, we assume t hat formul as are expressed in a certain
(formal) l anguage L wi t hout equality; the extension L(=) of L is going to be used onl y
in Section 5 for dealing wi t h noun phrase references. This means t hat nat ural l anguage
expressions such as "A is B," "A is the same as B," etc. are not directly represented by
logical equality; similarly, "not " is often not treated as logical negation; cf. Hintikka
(1985).
All logical notions t hat we are going to consider, such as t heory or model, will
be finitary. For example, a model woul d typically contain fewer t han a hundr ed ele-
ments of different logical sorts. Therefore these notions, and all other constructs we
are going to define (axioms, metarules, definitions etc.) are comput at i onal , al t hough
usual l y we will not provide explicit algorithms for comput i ng them. The issues of
control are not so i mport ant for us at this point; we restrict ourselves to describing
the logic. This Pri nci pl e of F i ni t i sm is also assumed by Johnson-Laird (1983), Jackendoff
(1983), Kamp (1981), and implicitly or explicitly by almost all researchers in compu-
tational linguistics. As a logical postulate it is not very radical; it is possible wi t hi n a
finitary framework to develop t hat part of mat hemat i cs t hat is used or has potential
applications in natural science, such as mat hemat i cal analysis (cf. Mycielski 1981).
On the other hand, a possible obstacle to our strategy of using onl y finite objects is
the fact that the deduct i ve closure of any set of formulas is not finite in st andard logic,
while, clearly, we will have to deduce new facts from formal representations of text
and background knowledge. But there are several ways to avoid this obstruction. For
example, consider theories consisting of universal formulas wi t hout function symbols.
Let Th( T) of such a t heory T be defined as T plus ground cl auses/ set s of literals prov-
able from T in st andard logic. It is easily seen t hat it is a closure, i.e. Th( Th( T) ) = Th( T) ;
and obviously, it is finite, for finite T. It makes sense t hen to require t hat logical conse-
quences of paragraph sentences have similar finite representations. However, in order
not to limit the expressive power of the formal language, we shoul d proceed in a
slightly different manner. The easiest way to achieve the above requirement is by pos-
tulating that all universes of discourse are al ways finite, and therefore all quantifiers
actually range over finite domains. In practice, we woul d use those t wo and other
tricks: we could forbid more t han three quantifier changes, because even in mathe-
matics more t han three are rare; we could restrict the size of universes of discourse to
some large number such as 1001; we could allow onl y a fixed finite nesting of func-
tion symbols (or operators) in formulas; etc. The intention of this illustration was to
convince the reader that we now can introduce the following set of definitions.
Defi ni ti ons
A theory is a finite set of sentences S ent (formulas wi t hout free variables
in some formal language).
178
A deduct i ve closure operator is a function Th : P( S e nt ) --* P( S e n t )
(a) T c Th( T) , for any T,
(b) Th( Th( T) ) = Th( T) ,
(c) Th( T) is finite, for finite T; additionally, we require it to be
ground, for ground T.
A t heory T is consi st ent if there is no formula ~ such that both ~ and - ~
belong to Th( T) (and inconsistent otherwise).
A model of a t heory T is defined, as usual, as an interpretation defined on
a certain domain, which satisfies all formulas of T. The collection of all
(finite) models of a t heory T will be denot ed by M ods ( T) .
The set of all subformulas of a collection of formulas F is denot ed by
F orm(F ). ~ is a ground instance of a formula G if ~ contains no variables,
and ~ = ~ , for some substitution 0.
Thus, we do not require Th( T) to be closed under substitution instances of tautologies.
Al t hough in this paper we take modus ponens as the mai n rule of inference, in general
one can consider deductive closures wi t h respect to weaker, nonst andard logics, (cf.
Levesque 1984; Frisch 1987; Patel-Schneider 1985). But we won' t pursue this topic
further here.
3.2 The Structure of Background Knowledge
Background knowl edge is not a simple list of meani ng post ul at es--i t has a structure
and it may contain contradictions and ambiguities. These actualities have to be taken
into account in any realistic model of nat ural language underst andi ng. For instance, the
verb "ent er" is polysemous. But, unless context specifies otherwise, "to come in" is a
more plausible meani ng t han "to join a group. " Assumi ng some logical representation
of this knowledge, we can write that
ent er(x, y) --* {come_in (x~ y); pl ace(y)}
ent er(x, y) --* { j oi n(x, y) &gr oup( y) ; . . .}
(el)
(e2)
and e2 ~enter el .
Two things shoul d be explained now about this notation:
Meanings of pr edi cat es/ wor ds are represented on the right-hand sides of
the arrows as collections of formulas--i. e. , theories. The mai n idea is that
these mini-theories of predicates appearing in a paragraph will jointly
provide enough constraints to exclude implausible interpretations. (One
can t hi nk of meani ngs as regions in space, and of constraints as sets of
linear inequalities approxi mat i ng these regions). How this can be done,
we will show in a moment.
These theories are partially ordered; and their partial orders are written
a s <enter(x,y), or <enter, or <i, or simply <, dependi ng on context. This is
our way of maki ng formal the asymmet r y of plausibility of different
meani ngs of a predicate. Again, a way of exploiting it will be shown
below.
179
Defi ni ti on
A referential level R is a structure
R = {(, <) : E Formulae}
where---for each - - < is a partially ordered (by a relation of
plausibility) collection of implications ~ T.
The t erm --* T st ands for the t heory { --* ~- : ~- E T}, and
~) ---* {1, 2 , ' ' "} abbrevi at es { --* 1, --* 2 , . . . } .
It is conveni ent to assume also that all formulas, except "l aws"- - whi ch
are s uppos ed to be al ways true---have the least preferred empt y
interpretation ~.
We suppose also that interpretations are addi t i onal l y ranked according
to the canonical partial orderi ng on subformul as. The ranking provi des a
nat ural met hod of dealing wi t h exceptions, as in the case of finding an
interpretation of ~ & p & fl wi t h R containing (~ --* y), (c~ & fl ~ -~y),
wher e -~y woul d be preferred to ywi f bot h are consistent, and bot h
defaults are equal l y preferred. This means that preference is gi ven to
more specific information. For instance, the sentence The officer went out
and st ruck the flag will get the readi ng "l ower ed the flag," if the
appropri at e t heory of strike(x, y) &f l ag( y) is part of backgr ound
knowl edge; if not, it will be under st ood as "hit the flag."
The referential level (R, <) may contain the theories listed below. Since we vi ew
a dictionary as an (imperfect) embodi ment of a referential level, we have der i ved the
formulas in every t heory T from a di ct i onary definition of the t erm . We bel i eve that
even such a crude model can be useful in practice, but a refinement of this model will
be needed to have a sophi st i cat ed t heory of a wor ki ng nat ural l anguage under st andi ng
system.
enter(x, y) --. { come_in(x, y); pl ace( y) ; . . . } (el)
/ * ent er - - t o come into a place */
enter(x, y) --* j oi n(x, y) &group( y) ; t ypi c a l l y : professionals(y)} (e2)
/ * ent er - - t o join a group; typically of professionals */
shi p(x) --* {large_boat(x); 3y carry(x, y)&(peopl e(y) v goods ( y) ) ; . . . } (shl)
/* s hi p- - a large boat for carrying peopl e or goods on the sea */
shi p(x) ---, { (large,aircraft(x) V space_ vehi cl e(x)); . . . }
bring(x, y) --* { carry(x, y) ; . . . }
bring(x, y) --. { cause(x, y) ; . . . }
disease(y) ---, { i l l nes s ( y) ; . . . }
/ * di sease
(sh2)
(bl )
/ * br i ng- - t o carry */
(b2)
/ * br i ng- - t o cause */
(de1)
illness caused by an infection */
180
d i s a s t e r ( y ) ~ { . . . ; 3 x c aus e ( y , x ) &h a r m( x ) . . . }
p o r t ( x ) ~ { h a r b o r ( x ) ; . . . }
( . . . )
(dr1)
/* disaster---causes a harm */
(pl)
/* por t - - har bor */
Note: We leave undefi ned the semantics of adverbs such as t ypi cal l y in (e2). This ad-
verb appears in the formula as an operator; our purpose in choosing this representation
is to call the reader' s attention to the fact that for any real applications the theories
will have to be more complex and very likely written in a higher order l anguage (cf.
Section 4).
The theories, whi ch we describe here onl y partially, restricting ourselves to their
relevant parts, represent the meani ngs of concepts. We assume as before t hat (el) is
more plausible t han (e2), i.e. e2 <enter el; similarly, for (shl), (sh2) and <ship, etc. This
particular ordering of theories is based on the ordering of meani ngs of the correspond-
ing words in dictionaries (derived and less frequent meani ngs have lower priority).
But one can imagine deriving such orderings by other means, such as statistics.
The partial order <enter has the theories { e l , e2, ~} as its domain; ~ is the least
preferred empt y interpretation corresponding to our lack of knowl edge about the
predicate; it is used when bot h (el) and (e2) are inconsistent wi t h a current object
theory. The domai n is ordered by the relation of preference ~ <enter e2 <enter el. The
t heory (el) will al ways be used in constructing theories and models of paragraphs
in which the expression "ent er" (in any grammatical form) appears, unless assumi ng
it woul d lead to an inconsistency. In such a case, the meani ng of "to ent er" woul d
change to (e2), or some other t heory belonging to R.
We woul d like to stress three points now: (1) the above implications are based
on the definitions that actually occur in dictionaries; (2) the ordering can actually be
found in some di ct i onari es--i t is not our own arbitrary invention; (3) it is natural to
treat a dictionary definition as a theory, since it expresses "the analysis of a set of
facts in their relation to one another," different definitions corresponding to possible
different analyses. (Encyclopedia articles are even more theory-like.) In this sense, the
notion of a referential level is a formalization of a real phenomenon.
Obviously, dictionaries or encyclopedias do not include all knowl edge an agent
must have to function in the world, or a program shoul d possess in order to under-
stand any kind of discourse. Al t hough the exact boundar y between worl d knowl edge
and lexical knowl edge is impossible to draw, we do know that lexicons usual l y contain
very little information about human behavior or temporal relations among objects of
the world. Despite all these differences, we may assume that worl d knowl edge and
lexical knowl edge (its proper subset) have a similar formal structure. And in the ex-
amples that we present, it is the structure that matters.
3.3 How to Use Background Knowl edge
The next few pages will be devot ed to an analysis of the interaction of background
knowl edge wi t h a logical representation of a text. We will describe two modes of such
an interaction; both seem to be present in our under st andi ng of language. One exploits
differences in plausibility of the meanings of words and phrases, in the absence of
context (e.g., the difference between a central and a peripheral sense, or between a
frequent and a rare meaning). The other one takes advant age of connections between
those meanings. We do not claim that this is the only possible such analysis; rather,
181
we present a formal model whi ch can perhaps be event ual l y di sproved and repl aced
by a bet t er one. As far as we know, this is the first such formal proposal.
3.3.1 Domi nance. In Figure I the theories of "enter," "ship," etc. and the partial orders
are represent ed graphically; more pl ausi bl e theories are posi t i oned higher. A pat h
t hrough this graph chooses an interpretation of the sentence S. For instance, the pat h
f i n t = {el, shl~ pl , bl, dl } and S say together that
A large boat (ship) t hat carries people or goods came i nt o the harbor and carried a
disease (illness).
Since it is the "highest" path, fint is the most pl ausi bl e (relative to R) interpretation of
the wor ds that appear in the sentence. Because it is also consistent, it will be chosen
as a best interpretation of S, (cf. Zadr ozny 1987a, 1987b). Anot her theory, consisting
of f~ = {el, sh2, pl , b2~ dl } and S, sayi ng that
A space vehicle came i nt o the harbor and caused a disease~illness
is less pl ausi bl e according to that ordering. As it t urns out, f~ is never const ruct ed in
the process of bui l di ng an interpretation of a par agr aph containing the sentence S,
unl ess assumi ng fint woul d l ead to a contradiction, for instance wi t hi n the higher level
context of a science fiction story.
The collection of these most pl ausi bl e consistent interpretations of a gi ven t heory
T is denot ed by PT< (T). Then fint bel ongs to PT< ( Th( { S } ) ) , but this is not true for f ' .
Note: One shoul d r emember that, in general, because all our orderi ngs are partial,
there can be more t han one most pl ausi bl e interpretation of a sentence or a paragraph,
and more that one "next best" interpretation. Moreover, to try to i mpose a total order
on all the pat hs (i.e. the cartesian pr oduct defi ned in Section 3.3.2) woul d be a mistake;
it woul d mean that ambiguities, represent ed in our formal i sm by existence of more
t han one ("best' 0 interpretation of a text, are out l awed.
3.3.2 Cart esi an Product s. Formalization of the interpretation pr oduced in 3.3.1. is pre-
sent ed below. Any pat h t hrough the graph of Figure 1 is an el ement .of the cartesian
pr oduct I 'I ~Esubformulas(S) (~I , of the partial orderings.
Figure 2 explains the geometric intuitions we associate wi t h the pr oduct and the
ordering. The pr oduct itself is given by the following definition:
Definition
Let F be a collection of formul as ~e, e G m, for some nat ural number m; and let, for
each e, <e be a partial orderi ng on the collection theories
of ~be. Define:
{e .... ,e G, }
I I ( F) ~- 1--[ ( e = 0c: (Ve G m)(31 < ne)~(e) = ~de -'~ Z~] }
e~m
We denot e by < the partial order i nduced on I I (F) by the orderi ngs <e and the canoni-
cal ordering of subformul as (a formul a is "great er" t han its subformulas). The geomet -
rical meani ng of this orderi ng can be expressed as "higher pat hs are more i mport ant
pr ovi ded t hey pass t hrough the most specific nodes. "
182
/ / / S
f i n t ~ . , . . , , ~ 1
. e I c - - - - - s h I b it _ . . i
f l . - j " ~ J ~ . . . . . , , 1 ~ 1
e 2 s h 2 , b 2 " a a
I I I
a a Q
Figure 1
The partial ordering of theories of the referential level R and the ordering of interpretations.
Since (shl) and (bl) dominate (respectively) (sh2) and (b2), the path f, nt represents a more
plausible interpretation than ft.
<1 < 2 < 5
" f l a g" " s trike " " s t ri ke ~ f l a g"
( 2 )
" c loth " " h i t /a n ge r" " l owe r" j
( I )
( 4 )
" t a i l " ( 3 )
"music"
Figure 2
The cartesian product I-I <i=-<1 x <2 x K3 can be depicted as a collection of all paths through
the graphs representing the partial orderings; a path chooses only one element from each
ordering--thus (1) and (2) are "legal" paths, while (4) is not. Also, more plausible theories
appear higher: "cloth" > "music" > ~. More specific paths are preferred: Assuming that all
higher paths, like path (1), are excluded by inconsistency, path (2) is the most plausible
interpretation of c~&fl, and it is preferred to (3). (More explanations in the text).
To make Fi gure 2 mor e i nt ui t i ve we assi gned some meani ngs to t he par t i al orders.
Thus, ( 1 represent s s ome possi bl e meani ngs of "fl ag, " s hown wi t h t he hel p of t he "key
wor ds; " t he meani ng "pi ece of cl ot h" preferred to " deer ' s tail. " The wor d "st ri ke" has
dozens of meani ngs, and we can i magi ne t hat t he meani ng of t he t ransi t i ve verb bei ng
r epr esent ed by ( 2, wi t h "hi t in anger " at t he t op, t hen "hit, e.g. a bal l " and "di scover "
equal l y preferred, and t hen all ot her possi bl e meani ngs. The trivial ( 3 r epr esent i ng
183
"strike a flag" shoul d remi nd us that we already know all that from Section 3.2. Notice
that pat h (2) does not give us the correct interpretation of "strike the flag," whi ch is
created from "cloth"~-"lower. "
Each element of the cartesian product I-[ <i represents a set of possible meanings.
These meani ngs can be combined in various ways, the simplest of whi ch consists of
taking their uni on as we did in 3.3.1. But a paragraph isn' t just a sum of its sentences,
as a sentence isn' t si mpl y a concatenation of its phrases. The cohesion devi ces--such
as "but, " "unless," "si nce"--arrange sentences together, and t hey also have semantic
functions. This is reflected, for instance, in the way various pieces of background
knowl edge are pasted together. Fortunately, at this point we can abstract from this by
i nt roduci ng an operator variable whose meani ng will be, as a default, that of a set
theoretic union, U; but, as we describe it in Section 6.2, it can sometimes be changed
to a more sophisticated join operator. There, when considering the semantics of "but, "
we' ll see that referential level theories can be combi ned in slightly more complicated
ways. In other words, a partial t heory corresponding to a paragraph cannot be just a
sum of the theories of its s e nt e nc e s I t he arrangement of those theories shoul d obey
the metalevel composition rules, whi ch give the semantics of connectives. However,
from a purel y formal point of view, @ can be any function produci ng a t heory from a
collection of theories.
The cartesian product represents all possible amal gamat i ons of these el ement ary
theories. In other words, this product is the space of possible combinations of mean-
ings, some of whi ch will be inconsistent wi t h the object level t heory T. We can imme-
diately exclude the inconsistent combinations, eliminating at least some nonsense:
I:I(F) = {f E II(F) : f is consistent wi t h T}
It remains now to fill in the details of the construction of PT<. We assume that a text
P can be translated into a (ground) t heory/ 5 (a set of logical sentences); T = Th(P) is
the set of logical consequences of P. We denot e by F the set Form(Th(_fi))--the set of all
subformulas of Th(/5), about whi ch we shall seek i nformat i on at the referential level
R. I f F = {~bl(C'~),... ,~b,(C'n)} (~/is a collection of constants t hat are argument s of ~bi),
is this theory, we have to describe a met hod of augment i ng it wi t h the background
knowledge. We can assume wi t hout loss of generality t hat each ~i(~i) in F has, in R, a
corresponding partial order <i of theories of ~i(xi). We now substitute the constants c'i
for the variables xi inside the theories of <i. With a slight abuse of notation, we will use
the same symbol <i for the new ordering. The product spaces II(F) and I:I(F) can t hen
be defined as before, wi t h the new orderings in place of the ones wi t h variables. Notice
that if onl y some of the variables of ~bi(~i) were bound by c'i, the same construction
woul d work. We have arrived t hen at a general met hod of linking object level formul as
wi t h their theories from R.
Now we can define PT< (T) of the t heory T as the set of most likely consistent
theories of T given by (H(F), <), where F = Form(T):
PT<(T) = { T U T ' : T ' = f and f is a maxi mal element of (I~I(F), <) }
Notice t hat PT< (T) can contain more t han one theory, meani ng that T is ambiguous.
This is a consequence of the fact that the cartesian product is onl y partially ordered
by <. The mai n reason for using ground instances ~i(Ci) in modi fyi ng the orderings
is the need to deal wi t h multiple occurrences of the same predicate, as in
John went to the bank by the bank.
184
The above const r uct i on is also ver y close in spi ri t to Pool e' s (1988) me t hod for def aul t
reasoni ng, wher e object t heori es are a ugme nt e d by gr ound i nst ances of defaul t s.
3.3.3 Coher ence Li nks. The r easoni ng t hat led to t he i nt ended i nt er pr et at i on fint i n our
di scussi on of domi nance was based on t he part i al or der i ng of t he t heori es of R. We
wa nt to expl oi t now anot her pr oper t y of t he t heori es of R- - t hei r coherence. Fi ndi ng
an i nt er pr et at i on for a nat ur al l anguage text or sent ence t ypi cal l y i nvol ves an appeal
to coherence. Consi der
$2: Entering the port, a ship brought a disaster.
Usi ng t he coherence l i nk bet ween (b2) and (dr1) (cf. Section 3. 2) - - t he presence of
cause(*, , ) in t he t heori es of "br i ng" and " di s as t er " - - we can fi nd a part i al coher ent
i nt er pr et at i on T E PTc(Th({S2})) of $2. In this i nt erpret at i on, t heori es expl ai ni ng t he
meani ngs of t er ms are chosen on t he basis of shar ed t erms. This makes (b2) ("t o br i ng"
means "t o cause") pl ausi bl e and t herefore it woul d be i ncl uded in T. The f or mal i zat i on
of all t hi s is gi ven bel ow:
Defi ni ti ons
The set of all t heori es about t he f or mul as of T is def i ned as:
Here, we i gnore t he orderi ng, because we are i nt erest ed onl y in
connect i ons bet ween concept s (represent ed by words).
If t, t r E G(T), t ~ t I, share a predi cat e, we say t hat t here is a c- l i nk
bet ween t and tC A c- pat h is def i ned as a chai n of c-links; i.e. if
t = ~ --* T and t ~ = ~' --* T' bel ong to a c-path, t hen ~ ~ ~' .
Under t hi s condi t i on, for a ny predi cat e, onl y one of its t heori es wi l l
bel ong to a c-path. A c-pat h t herefore chooses onl y one meani ng for each
t erm.
C(T) wi l l denot e t he set of all c-pat hs in G(T) consi st ent wi t h T, i.e. for
each p E C(T), Op td T is consistent.
This const r uct i on is like t he one we have encount er ed whe n def i ni ng
I~I(T). The det ai l s s houl d be filled out exact l y as before; we l eave t hi s to
t he reader.
We defi ne PTc(T) of a t heor y T as t he set of most coher ent consi st ent
t heori es of T gi ven by C(T):
PTc(T) = {T U T' : T' -- p and p is a C - maxi mal el ement of C(T)}
Goi ng back to $2, PTc(Th(S2)) cont ai ns also t he i nt er pr et at i on based on
t he coherence l i nk bet ween "shi p" and "bri ng, " whi ch i nvol ves "carry. "
Based on t he j ust -descri bed coherence relations, we concl ude t hat
sent ence $2 is ambi guous; it has t wo i nt erpret at i ons, based on t he t wo
senses of "bri ng. " Resol ut i on of ambi gui t i es i nvol ves factors be yond t he
scope of this sect i on- - f or i nst ance, Gri cean maxi ms and topic (Section 6),
or var i ous not i ons of cont ext (cf. Small et al. 1988). We wi l l cont i nue t he
185
topic of the interaction of object level theories wi t h backgr ound
knowl edge by showi ng how the t wo met hods of usi ng background
knowl edge can be combi ned.
3.3.4 Partial Theories. A finer interpretation of an object level t heory T--i t s part i al
t heor y- - i s obt ai ned by the iteration:
PT(T) = PT< (PTc(Th(T) ) )
PT is wel l defi ned after we specify that PT of a set of theories is the set of the PTs
(for bot h < and C):
PT{~( { T~, T2 , . . . } ) = PT(~(T1) U PT,](T2) U. . .
Not i ce that coherence does not deci de bet ween (el) and (e2) gi ven the above R, but
the iteration pr oduces t wo theories of $2, bot h of whi ch assert that the meani ng of
"ship ent ered" is "ship came."
A ship~boat came into the harbor/port and caused~brought a disaster.
A ship~boat came into the harbor/port and carried/brought a disaster.
PT( { S 1} ) contains onl y one interpretation based o n fint"
A ship~boat came into the harbor~port and carried~brought a disease.
Partial theories will be the mai n syntactic constructs in the subsequent sections. In
particular, the p-model s will be defi ned as some special model s or partial theories of
paragraphs.
3.4 Summary and Discussion
We have shown that finding an interpretation of a sentence depends on t wo graph-
theoretical pr oper t i es- - coher ence and dominance. Coherence is a pur el y "associative"
propert y; we are interested onl y in the existence of links bet ween represent ed con-
cept s/ t heori es. Domi nance uses the directionality of the partial orders.
A partial t heory PT(T) of an object t heory T correspondi ng to a paragraph is
obt ai ned by joining most pl ausi bl e theories or sentences, collocations, and wor ds of
the paragraph. However , this simple picture must be slightly ret ouched to account
for semantic roles of inter- and intra-sentential connectives such as "but," and to
assure consistency of the partial theory. These modifications have compl i cat ed the
definitions a little bit. The above definitions capt ure the fact that even if, in principle,
any consistent combi nat i on of the mini-theories about predi cat es can be ext ended to
an interpretation, we are really interested onl y in the most pl ausi bl e ones. The t heory
PT(T) is called "partial" because it does not contain all knowl edge about pr edi cat es- -
less pl ausi bl e propert i es are excl uded from consideration, al t hough t hey are accessible
shoul d an inconsistency appear. Moreover, the partiality is related to the unut i l i zed
possibility of iterating the operat or PT (cf. Section 4).
Ho w can we now summar i ze what we have l earned about the three logical levels?
To begi n with, one shoul d notice that t hey are syntactically distinct. If object level
theories are expressed by collections of first order formulas, met al evel defi ni t i ons--
e.g., to express as a defaul t that is a set theoretical uni on- - r equi r e anot her language,
186
such as higher order logic or set theory, where one can define predicates dealing
wi t h models, consistency, and provability. Even if all background knowl edge were
described, as in our examples, by sets of first order theories, because of the preferences
and inconsistencies of meanings, we could not treat R as a flat database of fact s--such
a model si mpl y woul d not be realistic. Rather, R must be treated as a separate logical
level for these syntactic reasons, and because of its funct i on--bei ng a pool of possibly
conflicting semantic constraints.
The last point may be seen better if we look at some differences between our
system and KRYPTON, whi ch also distinguishes between an object t heory and back-
ground knowl edge (cf. Brachman et al. 1985). KRYPTON's A-box, encoding the object
t heory as a set of assertions, uses st andard first order logic; the T-box contains informa-
tion expressed in a frame-based l anguage equivalent to a fragment of FOL. However,
the distinction between the two parts is purel y funct i onal --t hat is, characterized in
terms of the system' s behavior. From the logical point of view, the knowl edge base
is the uni on of the two boxes, i.e. a theory, and the entailment is standard. In our
system, we also distinguish between the "definitional" and factual information, but
the "definitional" part contains collections of mut ual l y excluding theories, not just
of formulas describing a semantic network. Moreover, in addi t i on to proposing this
structure of R, we have described the two mechanisms for exploiting it, "coherence"
and "domi nance, " which are not variants of the st andard first order entailment, but
abduction.
The idea of using preferences among theories is new, hence it was described in
more detail. "Coherence," as outlined above, can be underst ood as a declarative (or
static) version of marker passing (Hirst 1987; Charni ak 1983), wi t h one difference: the
activation spreads to theories that share a predicate, not t hrough the IS-A hierarchy,
and is limited to el ement ary facts about predicates appeari ng in the text.
The metalevel rules we are going to discuss in Section 6, and that deal wi t h the
Gricean maxims and the meani ng of "but, " can be easily expressed in the languages
of set t heory or higher order logic, but not everyt hi ng expressible in those languages
makes sense in natural language. Hence, put t i ng limitations on the expressive power
of the l anguage of the metalevel will remain as one of many open problems.
4. Coherence of Paragraphs
We are now in a position to use the notion of the referential level in a formal definition
of coherence and topic. Havi ng done that, we will t urn our attention to the resolution
of anaphora, linking it wi t h the provability relation (abduction) t-R+M and a metarule
postulating that a most plausible model of a paragraph is one in which anaphors have
references. Since the example paragraph we analyze has onl y one connective ("and"),
we can post pone a discussion of connectives until Section 6.
Building an interpretation of a paragraph does not mean finding all of its possible
meanings; the implausible ones shoul d not be comput ed at all. This vi ewpoi nt has
been reflected in the definition of a partial t heory as a most plausible interpretation
of a sequence of predicates. Now we want to restrict the notion of a partial t heory by
introducing the formal notions of t opi c and coherence. We can t hen later (Section 5.2)
define p-models--a category of models corresponding to par agr aphs- - as models of
coherent theories that satisfy all metalevel conditions.
The partial theories pick up from the referential level the most obvious or the
most i mport ant information about a formula. This i mmedi at e information may be
insufficient to decide the truth of certain predicates. It woul d seem therefore that
the iteration of the PT operation to form a closure is needed (cf. Zadr ozny 1987b).
187
However, there are at least three argument s against iterating PT. First of all, iteration
woul d increase the complexity of bui l di ng a model of a paragraph; infinite iteration
woul d almost certainly make impossible such a construction in real time. Secondly, the
cooperative principle of Grice (1975, 1978), under the assumpt i on that referential levels
of a writer and a reader are quite similar, implies that the writer shoul d structure the
text in a way that makes the construction of his i nt ended model easy for the reader;
and this seems to i mpl y that he shoul d appeal only to the most direct knowl edge of the
reader. Finally, it has been shown by Groesser (1981) t hat the ratio of derived to explicit
information necessary for under st andi ng a piece of text is about 8:1; furthermore, our
reading of the analysis of five paragraphs by Crothers (1979) strongly suggests that
onl y the most direct or obvious inferences are being made in the process of bui l di ng a
model or constructing a t heory of a paragraph. Thus, for example, we can expect t hat
in the worst case onl y one or two steps of such an iteration woul d be needed to find
answers to wh-questions.
Let P be a paragraph, let /3 = ( $1 , . . . , Sn) be its translation into a sequence of
logical formulas. The set of all predicates appeari ng in X will be denot ed by Pred(X).
Defi ni ti on
Let T be a partial t heory of a paragraph P. A sequence of predicates appeari ng in i6,
denot ed by Tp, is called a t opi c of the paragraph P, if it is a longest sequence satisfying
the conditions (1) and (2) below:
1. For all "sentences" Si, (a) or (b) or (c) holds:
(a) Direct reference to the topic:
Tp C Pred(Si)
(b) I ndirect reference to the topic:
If E Pred(Si) & ( -~ T) E T, t hen Tp C Pred(T)
(c) Direct reference to a previous sentence:
If E Pred(Si) & (~ ~ T) E T t hen Pred(Si_l)MPred( ~ TV~ ) # 9~
2. Either (i) or (ii) is satisfied:
(i) Existence of a topic sentence: Tp C Pred(Si), for some sentence Si;
(ii) Existence of a topic sentence: a t heory of Tp belongs to R, i.e. if 0 is
the conjunction of predicates of Tp t hen 0 ~ To E R, for
some To.
The last t wo conditions say that either the discussed concept (topic) already exists in
the background knowl edge or it must be i nt roduced in a sentence. For instance, we
can see t hat the sentence The effect of the Black Death was appalling can be assumed to
be a topic sentence.
The first three conditions make the requirements for a collection of sentences to
have a topic. Either every sentence talks about the topic (as, for instance, the first t wo
sentences of the paragraph about the Black Death), or a sentence refers to the topic
188
t hr ough backgr ound knowledge----the topic appear s in a t heor y about an ent i t y or a
rel at i on of t he sent ence (in t he case of Within twenty-four hours of infection . . . . "infec-
t i on" can be l i nked to "di s eas e' - - cf . Sections 2 and 4.2), or else a sent ence el aborat es
a f r agment of t he pr evi ous sent ence (the t heme The effect of... bei ng devel oped in I n
less than... ).
The defi ni t i on al l ows a par agr aph to have mor e t han one topic. For i nst ance, a
par agr aph consi st i ng of
John thinks Mary is pretty. John thinks Mary is intelligent. John wants to marry
her.
can be ei t her about {John(]), Mary(m), think(j, m, pretty(m))} , or about John, Mary, and
mar r yi ng. (Notice t hat t he condi t i on 2 (i) forbi ds us mer gi ng t he t wo t opi cs i nt o a
l arger one). Thus par agr aphs can be ambi guous about wha t const i t ut es t hei r topics.
The poi nt is t hat t hey s houl d have one.
It is also clear t hat wha t const i t ut es a topic depends on t he wa y t he cont ent of
par agr aph sent ences is represent ed. In t he l ast case, if "pr et t y" wer e t r ansl at ed i nt o a
predi cat e, and not i nt o a modi f i er of m (i.e. an operat or), "John t hi nki ng about Mar y"
coul d not be a topic, for it woul dn' t be t he l ongest sequence of predi cat es sat i sf yi ng
t he condi t i ons (1) and (2).
We' d like to put f or war d a hypot hesi s t hat t hi s r el at i onshi p bet ween topics and
represent at i ons can act ual l y be useful : Because t he r equi r ement t hat a wel l - f or med
par agr aph s houl d have a topic is a ver y nat ur al one (and we can j udge pr et t y wel l
what can be a topic and what can' t), we can obt ai n a ne w me t hod for j udgi ng semant i c
represent at i ons. Thus, if t he nai ve first or der r epr esent at i on cont ai ni ng pretty(m) as
one of t he f or mul as gives a wr ong ans wer as to wha t is t he t opi c of t he above, or
anot her, par agr aph, we can reject it in f avor of a (hi gher order) r epr esent at i on i n
whi ch adj ect i ves and adver bs are operat ors, not predi cat es, and whi ch pr ovi des us
wi t h an i nt ui t i vel y correct topic. Such a me t hod can be us ed in addi t i on to t he s t andar d
criteria for j udgi ng represent at i ons, such as el egance and abi l i t y to express semant i c
general i zat i ons.
Defi ni ti on
A part i al t heor y T E PT(P) of t he par agr aph P is coherent iff t he par agr aph P has a
topic.
A r a ndom per mut at i on of j ust any sent ences about a di sease woul dn' t be coher-
ent. But it woul d be pr emat ur e to j ump to t he concl usi on t hat we need mor e t han j ust
existence of a topic as a condi t i on for coherence. Al t hough it ma y be t he case t hat it
wi l l be necessar y in t he f ut ur e to i nt r oduce not i ons like "t empor al coherence, " "dei ct i c
coherence, " or "causal coherence, " t here is no need to st art mul t i pl yi ng bei ngs now.
We can sur mi se t hat t he r a ndom per mut at i ons we t al k about woul d pr oduce an in-
consi st ent t heory; hence, t he t empor al , causal, and ot her aspect s woul d be deal t wi t h
by consistency. But of course at this poi nt it is j ust a hypot hesi s.
An i mpor t ant aspect of t he def i ni t i on is t hat coherence has been def i ned as a prop-
er t y of r epr es ent at i on- - i n our case, it is a pr oper t y of a f or mal theory. The existence of
t he topic, t he di rect or i ndi rect al l usi on to it, and anaphor a (whi ch wi l l be addr es s ed
bel ow) t ake up t he i ssue of f or mal criteria for a par agr aph defi ni t i on, whi ch was rai sed
189
by Bond and Hayes (1983) (cf. also Section 2.1). The quest i on of paragraph length can
probabl y be at t ended to by limiting the size of p-models, perhaps after i nt roduci ng
some ki nd of metric on logical dat a structures.
Still, our definition of coherence may not be restrictive enough: t wo collections
of sentences, one referring to "black" (about black pencils, black pullovers, and bl ack
poodles), the ot her one about "deat h" (war, cancer, etc.), connect ed by a sentence
referring to bot h of these, coul d be i nt erpret ed as one paragraph about the new, br oader
topic "black + death. " This pr obl em may be similar to the situation in whi ch current
formal gr ammar s allow nonsensical but parsabl e collections of wor ds (e.g., "colorless
green i deas. . . '9, whi l e before the advent of Chomskyan formalisms, a sentence was
defi ned as the smallest meani ngful collection of words; Fowl er (1965, p. 546) gives 10
definitions of a sentence.
It t hen seems wor t h differentiating bet ween the creation of a new concept like
"black + death, " wi t h a meani ng given by a paraphrase of the exampl e collection of
sentences, and the acceptance of the new concept - - st or i ng it in R. In our case the
concept "black + death, " whi ch does not refer to any normal experiences, woul d be
di scarded as useless, al t hough the collection of sentences woul d be recogni zed as a
strange, even if coherent, paragraph.
We can also hope for some fine-tuning of the not i on of topic, whi ch woul d prevent
many offensive examples. This appr oach is t aken in comput at i onal syntactic grammars
(e.g. Jensen 1986); the number of unlikely parses is severel y r educed whenever pos-
sible, but no at t empt is made to define only the so-called grammat i cal strings of a
language.
Finally, as the paragraph is a nat ural domai n in whi ch wor d senses can be reliably
assi gned to wor ds or sentences can be syntactically di sambi guat ed, larger chunks of
di scourse may be needed for precise assi gnment of topics, whi ch we vi ew as anot her
t ype of di sambi guat i on. Not i ce also that for coherence, as defi ned above, it does not
mat t er whet her the topic is defi ned as a longest, a shortest, or - - s i mpl y- - a sequence of
predi cat es satisfying the conditions (1) and (2); the existence of a sequence is equi val ent
wi t h the existence of a shortest and a longest sequence. The reason for choosi ng a
longest sequence as the topic is our belief that the topic shoul d rather contain more
information about a paragraph t han less.
4.1 Comparison with Other Approaches
At this poi nt it may be proper to comment on the relationship bet ween our t heory of
coherence and theories advocat ed by others. We are going to make such a compari son
wi t h the theories pr oposed by J. Hobbs (1979, 1982) that represent a more comput a-
tionally ori ent ed approach to coherence, and those of T.A. van Dijk and W. Kintch
(1983), who are more interested in addressi ng psychol ogi cal and cognitive aspects of
di scourse coherence. The quot ed wor ks seem to be good represent at i ves for each of
the directions; t hey also poi nt to related literature.
The appr oach we advocat e is compat i bl e wi t h the wor k of these researchers, we
believe. There are, however, some interesting differences: first of all, we emphasi ze the
role of paragraphs; second, we talk about formal principles regulating the organization
and use of knowl edge in l anguage underst andi ng; and third, we realize that nat ural
l anguage text (such as an on-line dictionary) can, in many cases, pr ovi de the t ype of
commonsense backgr ound information that Hobbs (for example) advocat ed but di dn' t
know how to access. (There are also some other, minor, differences. For instance, our
three-level semantics does not appeal to possi bl e worl ds, as van Dijk and Kintch do;
neither is it objectivist, as Hobbs' semantics seems to be.)
190
We shall discuss onl y the first t wo points, since the third one has al ready been
explained.
The chief difference bet ween our approach and the ot her t wo lies in i dent i fyi ng the
paragraph as a domai n of coherence. Hobbs, van Dijk, and Kintch di st i ngui sh bet ween
"local" coher ence~a pr oper t y of subsequent sent ences- - and "global" coherence---a
pr oper t y of di scourse as a whole. Hobbs explains coherence in terms of an i nvent ory
of "local," possi bl y comput abl e, coherence relations, like "elaboration, " "occasion,"
etc. (Mann and Thompson 1983 give an even more detailed list of coherence relations
t han Hobbs. ) Van Dijk and Kintch do this too, but t hey also describe "macrost ruct ures"
representing the global content of discourse, and t hey emphasi ze psychological and
cognitive strategies used by peopl e in establishing di scourse coherence. Since we have
linked coherence to model s of paragraphs, we can talk si mpl y about "coherence"--
wi t hout adj ect i ves--as a pr oper t y of these models. To us the first "local" domai n seems
to be too small, and the second "global" one too large, for constructing meani ngful
comput at i onal models. To be sure, we believe relations bet ween pairs of sentences are
wor t h investigating, especially in dialogs. However, in wri t t en discourse, the smallest
domai n of coherence is a paragraph, ver y much as the sentence is the basic domai n
of grammaticality (although one can also j udge the correctness of phrases).
To see the advant age of assumi ng that coherence is a pr oper t y of a fragment of a
t ext / di scourse, and not a relation bet ween subsequent sentences, let us consi der for
instance the text
John took a train from Paris to I stanbul. He likes spinach.
According to Hobbs (1979, p. 67), these t wo sentences are incoherent. However, the
same fragment, augment ed wi t h the third sentence Mary told him yesterday that the
French spinach crop failed and Turkey is the only count ry. . . (ibid.) suddenl y (for Hobbs)
becomes coherent. It seems that any analysis of coherence in terms of the relation
bet ween subsequent sentences cannot explain this sudden change; after all, the first
t wo sentences di dn' t change when the third one was added. On the ot her hand, this
change is easily explained when we treat the first t wo sentences as a paragraph:
if the third sentence is not a part of the background knowl edge, the paragraph is
incoherent. And the paragraph obt ai ned by addi ng the third sentence is coherent.
Moreover, coherence here is clearly the result of the existence of the topic "John likes
spinach."
We deri ve coherence from formal principles regulating the organization and use
of knowl edge in l anguage underst andi ng. Al t hough, like the aut hors di scussed above,
we stress the i mport ance of inferencing and background knowl edge in det ermi ni ng
coherence, we also address the probl em of knowl edge organization; for us the cen-
tral probl em is how a model emerges from such an organization. Hobbs sets forth
hypot heses about the interaction of background knowl edge wi t h sentences that are
examined at a given moment ; van Dijk and Kintch pr ovi de a weal t h of psychological
information on that topic. But their analyses of how such knowl edge coul d be used
are quasi-formal. Our poi nt of depart ure is different: we assume a certain si mpl e struc-
ture of the referential level (partial orders) and a nat ural wa y of usi ng the knowl edge
contained there ("coherence links" + "most pl ausi bl e = first"). Then we examine what
corresponds to "topic" and "coherence"---they become mat hemat i cal concepts. In this
sense our wor k refines these concepts, changes the wa y of looking at t hem by linking
t hem to the notion of paragraph, and put s the findings of the other researchers into a
new context.
191
5. Model s of Paragraphs
We argue below that paragraphs can be mapped into models wi t h small, finite uni-
verses. We could have chosen another, more abstract semantics, wi t h infinite models,
but in this and all cases below we have in mi nd comput at i onal reasons for this enter-
prise. Thus, as in the case of Kamp' s (1981) DRS, we shall construct a ki nd of Herbrand
model of texts, wi t h common and proper names translated into unar y predicates, in-
transitive verbs into unar y predicates, and transitive verbs into bi nary predicates. In
bui l di ng the logical model M of a collection of formulas S corresponding to the sen-
tences of a paragraph, we assume that the universe of M contains constants i nt roduced
by elements of S, usual l y by ones corresponding to NPs, and possibly by some formu-
las picked by the construction from the referential level. However, we are interested
not in the relationship bet ween t rut h conditions and representations of a sentence,
but in a formalization of the way knowl edge is used to produce a representation of a
section of text. Therefore we need not onl y a logical description of the t rut h conditions
of sentences, as presented by Kamp, but also a formal analysis of how background
knowl edge and metalevel operations are used in the construction of models. This
extension is i mport ant and nontrivial; we doubt that one .can deal effectively wi t h
coherence, anaphora, presuppositions or the semantics of connectives wi t hout it. We
have begun presenting such an analysis in Section 3, and we continue now.
5.1 The Example Revisited: Preparation for Bui l di ng a Model
We return now to the example paragraph, to illustrate how the interaction bet ween
an object t heory and a referential level produces a coherent interpretation of the text
(i.e., a p-model) and resolves the anaphoric references. The met hod will be similar
to, but more formal than, what was presented in Section 2. In order not to bore the
reader wi t h the same details all over again, we will use a shorter version of the same
text.
Example 2
PI: In 1347 a ship entered the port of Messina bringing wi t h it the disease
that came to be known as the Black Death.
P2: It struck rapidly.
P3: Within t went y-four hours of infection came an agonizing death.
5.1.1 Translation to Logic. The text concerns events happeni ng in time. Naturally, we
will use a logical not at i on in whi ch formulas may have t emporal and event compo-
nents. We assume that any formal interpretation of time will agree wi t h the intuitive
one. So it is not necessary now to present a formal semantics here. The reader may
consult recent papers on this subject (e.g. Moens and St eedman 1987; Webber 1987) to
see what a formal interpretation of events in time mi ght look like. Since sentences can
refer to events described by other sentences, we may need also a quot at i on operator;
Perlis (1985) describes how first order logic can be augment ed wi t h such an operator.
Extending and revising Jackendoff' s (1983) formalism seems to us a correct met hod
to achieve the correspondence bet ween syntax and semantics expressed in the gram-
matical constraint ("that one shoul d prefer a semantic t heory that explains otherwise
arbitrary generalizations about the syntax and the lexicon"---ibid.).
However, as not ed before, we will use a simplified version of such a logical no-
tation; we will have onl y time, event, result, and property as primitives. After these
remarks we can begin constructing the model of the example paragraph. We assume
192
t hat const ant s are i nt r oduced by NPs. We have t hen
(i) Co n s t a n t s s, m, d, i, b, 1347 sat i sfyi ng: ship(s), Messina(m), disease(d),
infection(i), death(b), year(1347).
( ii) F o r mu l a e
$1: [ t i me : year(1347); e v e n t : ent er(s, m) & ship(s) & port(m) &
bring(xo, d) & disease(d) & name(d, Bl a c k De a t h ) & (Xo -- s V
x0 = d V x0 -- m)]
$2: t i me : past; e v e n t : rapidly: strike(yo) & (yo -- s V yo = m V
y0 - - d) ]
$3: 3t, t ' {[ t i me : t; infection(i)] & [ t i me: t' c (t, t + 24h);
e v e n t : come(b) & death(b) & agonizing(b)]}
The not at i on t i me : ~(t ); e v e n t : fl s houl d be unde r s t ood as meani ng
t hat t he event des cr i bed by t he f or mul a fl t ook pl ace i n (or dur i ng) t he
t i me per i od descr i bed by t he f or mul a c~(t), t r anges over i nst ant s of t i me
(not i nt erval s).
Not e. We as s ume t hat "st ri ke" is us ed i nt ransi t i vel y. But our const r uct i on of t he
p- model s of t he par agr aph wo u l d l ook exact l y t he s ame for t he t ransi t i ve meani ng,
except t hat we wo u l d be expect ed to i nfer t hat peopl e wer e har med by t he illness.
5.1.2 Ref er ent i al Level . We use onl y s t andar d di ct i onari es as a sour ce of ba c kgr ound
i nformat i on. The cont ent of this i nf or mat i on is of s econdar y i mpor t ance we wa nt t o
st ress t he formal , l ogi cal si de of t he i nt eract i on be t we e n t he referent i al l evel and t he
obj ect t heory. Therefore we r epr esent bot h in our si mpl i f i ed l ogi cal not at i on, and not
in English. All f or mul as at t he referent i al l evel be l ow have been obt ai ned by a di rect
t ransl at i on of appr opr i at e ent ri es in Webs t er ' s and Longman. The t ransl at i on i n t hi s
case was manual , but coul d be aut omat ed.
Re f e r e n t i a l l e v e l ( a fragment ):
ship(x) --, { large: boat(x); 3ycar r y( x, y) & (people(y) v goods(y)) & agent ( x) ; . . . } ( shl )
/ * s hi p- - a l arge boat for car r yi ng peopl e or goods on t he sea */
bring(x, y) --* { carry(x, y); . . . } ( bl )
strike(x, y) --* { hi t (x, y); agent(x) & pat i ent ( y) ; . . . }
strike(x) --~ { hit(x); agent ( x) ; . . . }
strike(x) --* { illness(x) & By s uddenl y: har m( x, y) ; . . . }
/ * br i ng- - t o car r y */
(sl a)
( sl b)
/ * st ri ke---t o hit */
(s2 _ex)
/ * st ri ke---t o har m s uddenl y; "t hey wer e st r uck by illness */
193
disease(y) ---* { illness(y) & 3z (infection(z) & c aus e s ( z , y ) ) ; . . . } (dl)
/* disease illness caused by an infection */
death(x) --~ { 3t , y[ x =' [ t i me : t ; event : di e(y) & (ereature.(y) v plant(y))]' ]} (de_l)
/* deat h- - an event in whi ch a creature or a plant dies */
come(x) --* {3t [ t i me : t; event : arrive(x)]} (ct_l)
/* to come- - t o arrive ( . . . ) in the course of time */
infection(x) --~ {3e, y, z[e = event : i nf ect (y, z) & person(y) & disease(z))]
& x -- resul t (e)} (i_1)
/* i nfect i on--t he result of being infected by a disease */
agonizing(x) ~ {3y causes(x, y) & pai n(y)} (a_l)
/* agonizing---causing great pain */
enter(x, y) - . { come_in(x, y); place(y)} (e_l)
/* ent er - - t o come into a place */
We have shown, in Section 3, the role of preferences in bui l di ng the model of a para-
graph. Therefore, to make our exposition clearer, we assume t hat all the above theories
are equally preferred. Still, some interesting things will happen before we arrive at our
i nt ended model.
5.1.3 Provability and Anaphora Resolution. To formalize a part of the process for
finding antecedents of anaphors, we have to introduce a new logical not i on- - t he re-
lation of weak R + M-abduction. This relation woul d hold, for instance, bet ween the
object t heory of our example paragraph and a formula expressing the equality of t wo
constants, i and i', denot i ng (respectively) the "infection" in the sentence Within t went y-
f our hours of infection . . . . and the "infection" of the t heory ( dl ) - - a disease is an illness
caused by an infection. This equality i = i' cannot be proven, but it may be reasonably
as s umed- - we know that in this case the infection i' caused the illness, which, in turn,
caused the death.
The necessity of this ki nd of merging of argument s has been recognized before:
Charni ak and McDermott (1985) call it abductive unification~matching, Hobbs (1978, 1979)
refers to such operations using the terms kni t t i ng or pet t y conversational implicature.
Neither Hobbs nor Charni ak and McDermott tried t hen to make this notion precise,
but the paper by Hobbs et al. (1988) moves in that direction. The purpose of this
subsection is to formalize and explain how assumpt i ons like that one above can be
made.
Definition
A formul a ~ is weakly provable from an object t heory T, expressed as T t-r ~, iff there
exists a partial t heory T E PT( T) such t hat T F ~b, i.e. T proves ~ in logic. (We call F-r
"weak" because it is enough to find one partial t heory provi ng a given formula.)
As an example, in the case of the three-sentence paragraph, we have a partial
t heory T1 based on (slb) sayi ng t hat " 'it' hi t s rapidly," and T2 sayi ng t hat "an illness
('it') har ms rapi dl y" (s2_ex). Thus bot h statements are weakl y provable.
194
Since we vi ew the met al evel constraints M rather as rules for choosi ng model s than
as special inference rules, the definition of the R+M-abduct i on is model-theoretic, not
proof-theoretic:
Def i ni t i on
A preferred model of a t heory T is an element of Mods(T) that satisfies met al evel con-
straints contained in M. The set of all preferred model s of T is denot ed by PM( T) .
A formul a 4 of L(=), the l anguage wi t h equality, is weakl y R + M-abductible from
an object t heory T, denot ed by T ~-a+M G iff there exists a partial t heory T E PT( T) and
a preferred model M E PM( T) such that M ~ G i.e. 4 is true in at least one preferred
model of the partial t heory T.
Note: The notions of strong provabi l i t y and strong R + M-abduct i on can be in-
t roduced by replacing "there exists" by "all" in the above definitions (cf. Zadr ozny
1987b). We will have, however, no need for "strong" notions in this paper. Also, in a
practical system, "satisfies" shoul d be pr obabl y replaced by "violates fewest. "
Obviously, it is bet t er to have references of pr onouns resol ved than not. After
all, we assume that texts make sense, and that aut hors know these references. That
applies to references of noun phrases too. On the other hand, there must be some
restrictions on possi bl e references; we woul d rather assume that "spinach" ~ "train"
(i.e. V x, y)(spinach(x) & train(y) --, x # y)), or "ship" # "disease." Two el ement ary
conditions limiting the number of equalities are: an equal i t y N1 = N2 may be assumed
onl y if either N1 and N2 are listed as synonyms (or paraphrases) or their equality is
explicitly asserted by the partial t heory T. Of course there are other conditions, like
"typically, the det ermi ner ' a' i nt roduces a new entity, whi l e ' the' refers to an al ready
i nt roduced constant." (But notice that in our exampl e paragraph "infection" appears
wi t hout an article.) All these, and other, guidelines can be articulated in the form of
metarules.
We define anot her partial order, this time on model s Mods(T) of a partial t heory
T of a paragraph: M1 >= M2, if M1, satisfies more R + M-abduct i bl e equalities than
M2. The principle articulating preference for havi ng the references resol ved can now
be expressed as
Metarul e 1
Assume that T E PT(P) is a partial t heory of a paragraph P. Every preferred model
M E PM(T) is a maximal el ement of the orderi ng >= of Mods(T).
To explain the meani ng of the metarule, let us anal yze the paragraph (P1, P2, P3)
and the background knowl edge needed for some ki nd of rudi ment ary underst andi ng
of that text. The rule (i_1) (infection is a result of being infected by a disease... ), dealing
wi t h the infection i, i nt roduces a disease dl; we also know about the existence of the
disease d in 1347. Now, notice that there may be many model s satisfying the object
t heory of the paragraph P augment ed by the background knowl edge. But we can find
t wo among them: in one, call it M1, d and dl are identical; in the other one, M2, t hey
are distinct. The rule says that onl y the first one has a chance to be a preferred model
of the paragraph; it has more noun phrase references resol ved than the ot her model ,
or - - f or mal l y- - i t satisfies more R + M-abduct i bl e equalities, and therefore M1 >= M2.
This reasoning, as the reader surel y has noticed, resembles the exampl e about
infections from the begi nni ng of this section. The difference bet ween the cases lies in
the equal i t y d = dl being the result of a formal choice of a model, while i = i ~ wasn' t
proved, just "reasonabl y" assumed.
195
In interpreting texts, knowl edge of typical subjects and typical objects of verbs
helps in anaphora resolution (cf. Braden-Harder & Zadr ozny 1990). Thus if we know
that A farmer grows vegetables, either havi ng obt ai ned this information directly from
a text, or from R, we can reasonabl y assume tlhat He also grows some cotton refers to
the farmer, and not to a pol i ceman ment i oned in the same paragraph. Of course, this
shoul d be onl y a defeasible assumpt i on, if nothing indicates otherwise. We now want
to express this strategy as a metarule:
Metarule 2
Let us assume that it is known that P(a, b) & Q(a) & R(b), and it is not known that
P(a', X), for any X. Then model s in whi ch P(a, c) & R'(c) hol ds are preferred to model s
in whi ch P(a' ,c) & R'(c) is true.
One can think of this rule as a model-theoretic versi on of Ockham' s razor or
abduction; it says "mi ni mi ze the number of things that have the pr oper t y P( , , , ), "
and it al l ows us to dr aw certain conclusions on the basis of partial information. We
shall see it in action in Section 5.2.
We have no doubt s that vari ous other met arul es will be necessary; clearly, our
t wo met arul es cannot constitute the whol e t heory of anaphora resolution. They are
i nt ended as an illustration of the power of abduct i on, whi ch in this f r amewor k helps
det ermi ne the uni verse of the model (that is the set of entities that appear in it). Ot her
factors, such as the role of focus (Grosz 1977, 1978; Sidner 1983) or quantifier scopi ng
(Webber 1983) must pl ay a role, too. Det ermi ni ng the relative i mport ance of t hose
factors, the above metarules, and syntactic clues, appears to be an interesting topic in
itself.
Note: In our translation from English to logic we are assumi ng that "it" is anaphori c
(with the pr onoun fol l owi ng the el ement that it refers to), not cataphoric (the other
wa y around). This means that the "it" that br ought the disease in P1 will not be con-
si dered to refer to the infection "i" or the deat h "d" in P3. This strategy is certainly
the right one to start out with, since anaphora is al ways the more typical direction of
reference in English prose (Hal l i day and Hasan 1976, p. 329).
Since techniques devel oped el sewhere may pr ove useful, at least for compari son,
it is wor t h ment i oni ng at this poi nt that the pr oposed met arul es are distant cousins
of "uni que-name assumpt i on" (Genesereth and Ni l sson 1987), "domai n closure as-
sumpt i on" (ibid.), "domai n circumscription" (cf. Etherington and Mercer 1987), and
their kin. Similarly, the not i on of R + M-abduct i on is spiritually related to the "abduc-
tive inference" of Reggia (1985), the "diagnosis from first principles" of Reiter (1987),
"explainability" of Poole (1988), and the subset principle of Berwick (1986). But, ob-
viously, t ryi ng to establish precise connections for the met arul es or the provabi l i t y
and the R + M-abduct i on woul d go much beyond the scope of an ar gument for the
correspondence of paragraphs and models. These connections are bei ng exami ned
el sewhere (Zadrozny forthcoming).
5.2 p-Model s
The construction of a model of a paragraph, a p-model , must be based on the in-
format i on cont ai ned in the paragraph itself (the object theory) and in the referential
l evel while the met al evel restricts ways that the model can be constructed, or, in ot her
wor ds, provi des criteria for choosi ng a most pl ausi bl e model(s), if a partial t heory is
ambi guous. This role of the met arul es will be clearly visible in finding references of
pr onouns in a si mpl e case requiring onl y a rule post ul at i ng that these references be
196
searched for, and in a more complex case (in Section 5) when t hey can be found onl y
by an interplay of background knowl edge and (a formalization of) Gricean maxims.
Def i ni t i on
M is a p - mo d e l of a paragraph P iff there exists a coherent partial t heory T E PT( P)
such that M E PM ( T) .
Havi ng defined the notion of a p-model, we can mimic now, in logic, the rea-
soning presented in Section 2.2. Using background information and the translation of
sentences, we build a p-model of the paragraph. This involves det ermi ni ng the ref-
erences of the pronoun "it," and deciding whet her "struck" in the sentence I t st ruck
rapidly means "hit" (slb) or "har med" (s2_ex). We have then two meani ngs of "strike"
and a number of possibilities for the pronouns.
We begin by constructing the two classes of preferred models given by (slb) and
(s2_ex), respectively. It is easily seen that, in the models of the first class, based on
{$2, slb}, (that is { rapi dl y: st ri ke( yo) and strike(x) --~ h i t ( x ) . . . } ) , together wi t h all other
available information, do not let us R+M- abduct anyt hi ng about y0, i.e., the referent for
the subject pronoun "it" in P2 (it st ruck rapidly). On the other hand, from {$2, s2_ex, dl}
we R + M-abduct that y0 = d, i.e. the disease st ruck rapidly. That is the case because
s2_ex implies that the agent that "struck rapi dl y" is actually an illness. From rapidly :
strike(yo), strike(x) - * illness(x) & . . . . disease(y) --, illness(y) & . . . . and disease(d) we can
infer illness(yo) and illness(d); by the Metarule (1) we conclude that Y0 -- d. In other
words, the referent for the subject "it" is "disease." Thus the Metarule (1) i mmedi at el y
eliminates all the models from the first class given by (slb), in which "struck" means
"hit."
Notice that we cannot prove in classical logic that the ship has brought the disease.
But we are allowed to assume it by the above formal rule as the most plausible state
of affairs, or - - i n other wor ds - - we prove it in our three-level logic.
We are left t hen wi t h models of the three sentences ($1, $2, $3) that contain {$2,
s2_ex, dl } ; t hey all satisfy y0 = d. We now use { S l , s h l , b l } ( e n t e r ( s , m) & ship(s) &
bring(xo~ d) & . . . ; ship(s) --~ ( 3y) carry( s, y) & . . . ; V z[bring(xo, z) --* carry(xo, z)]). From
these facts we can conclude by Metarule (1) that x0 = s: a "ship" is an agent that carries
goods; to "bring" means to "carry"; and the disease has been brought by somet hi ng- -
we obtain carry(xo, d) and carry(s, y); and then by Metarule (2), carry(s, d). That is, the
referent for the pronoun "it" in P1 (... bringing wi t h it the disease... ) shoul d be "ship."
Observe that we do not assert about the disease that it is a kind of goods or people;
the line of reasoning goes as follows: since ships are known to carry people or goods,
and ports are not known to carry anything, we may assume that the ship carried the
disease along wi t h its st andard cargo.
Havi ng resolved all pronoun references, wi t h no ambi gui t y left, we conclude that
the class PM ( P) consists of onl y one model, based on the the partial t heory
{$1, $2, $3, s hl , bl , e_l, s2_ex, dl , de1, i l , al , ct l } .
The model describes a situation in which the ship came into the por t / har bor ; the
ship brought the disease; the disease was caused by an infection; the disease harmed
rapidly, causing a painful death; and so on.
The topic Tp of (P1, P2, P3) is the disease(x). The first sentence talks about it; the
second one refers to it using the pronoun "it," and the third one extends our knowl-
edge about the topic, since "disease' is linked to "infection" t hrough dl. Furthermore,
197
f -1
II p l a c e ( m ) ! r
' i h r b r C m )
I
P I s h i p ( s ) - - ~ p o r t ( m ) j - -
t i m e : 1 :547 I
0
s m - ~
I n a m e : M e s s i n l
s ~ n ome : B l o c k D e ot h l
, , , d Z
I ] i n f e c t i o n ~ n e s s ( d ) J
p l i l l s u d d e n l y I I
/ /
F'" I
I o e v e n t ( e ' ) I
I I
I I
I I
P 3 . - a . q i n - t i m e ( 2 4 + t ) I
b t ? I
- - d e o t h ( ~ ~ [ io i n f e c t i n ( i ) I
l g n i z i n g ( b ) - ~ ' o e , L e n t ( e )
I
Ii p e r s o n ( p ) d i s e o s e ( d ) l I
Fi gu re 3
The p-model for the example paragraph
"disease" is the only noun phrase ment i oned or referred to in all three sentences; the
sentence $1 is the topic sentence. The p-model of the paragraph is represented by
Figure 3.
But notice that our definition of topic licences also other analyses, for example, one
in whi ch all the predicates of the first sentence constitute the topic of the paragraph, $2
elaborates $1 (in the sense of condition I (c) of the definition of topic), and $3 elaborates
$2. Based on the larger context, we prefer the first analysis; however, a comput at i onal
criterion for such a preference remains as an open problem.
6. On t h e Ro l e o f t h e Me t a l e v e l
We have already seen examples of the application of metalevel rules. In the analysis
of the paragraph, we applied one such rule expressing our commonsense knowl edge
about the usage of pronouns. In this section we discuss t wo other sources of met-
alevel axioms: Gricean cooperative principles, whi ch reduce the number of possible
198
interpretations of a text or an utterance; and connectives and modal i t i es--such as
"but, . . . . unless," or "maybe"---which refer to the process of constructing the models
or partial theories, and to some operations on t hem (see Figure 4).
We can see t hen two applications of metarules: in constructing models of a text
from representations of sentences, and in reducing, or constraining, the ambi gui t y
of the obtained structure. We begin by showi ng how to formalize the latter. In the
next subsection (6.1), assumi ng the Gricean maxims to be constraints on l anguage
communication, either spoken or written, we use their formal versions in bui l di ng
partial theories. A specific instance of the rule of "quant i t y" turns out to be applicable
to anaphora resolution. That example will end our discussion of anaphora in this
article.
The last topic we i nt end to tackle is the semantic role of conjunctions. In subsection
6.2 we present a metalevel axiom dealing wi t h the semantic role of the adversative con-
junction "but;" then we talk about some of its consequences for constructing models
of text. This will complete our investigation of the most i mport ant issues concerning
paragraph structure: coherence (how one can determine that a paragraph expresses
a "thought' 0, anaphora (how one can comput e "links" between entities that a para-
graph talks about), and cohesion (what makes a paragraph more t han just a sum of
sentences). Of course, we will not have final answers to any of these problems, but
we do believe t hat t he/ a direction of search for computational models of text will be
visible at that point.
We assume a flat structure of the metalevel, envisioning it as a collection of (closed)
formulas written in the l anguage of set t heory or higher order logic. In either of the t wo
theories it is possible to define the notions of a model, satisfiability, provability, etc. for
any first order language (cf. e.g. Shoenfield 1967); therefore the metalevel formulas can
say how partial theories shoul d be constructed (specifying for instance the meani ng of
0) and what kinds of models are admissible. The metarules thus form a logical t heory
in a special language, such as the language of ZF-set theory. However, for the sake of
readability, we express all of t hem in English.
6.1 A Formal i zat i on of Gri cean Maxi ms
A Gricean Cooperative Principle applies to text, too. For instance, in normal writing
people do not express common knowl edge about typical functions of objects. In fact,
as the reader may check for himself, there is not hi ng in Gricean maxims that does not
appl y to written language. That the maxims pl ay a semantic role is hardl y surprising.
But that t hey can be axiomatized and used in bui l di ng formal models of texts is new.
We present in the next couple of paragraphs our formalization of the first maxim, and
sketch axiomatizations of the others. Then we will appl y the formal rule in an example.
Gricean maxims, after formalization, belong to the rnetalevel. This can be seen from
our formalization of the rule "don' t say too much. " To this end we define redundancy
of a partial t heory T of a paragraph as the situation in which some sentences can be
logically derived from other sentences and from the t heory T in a direct manner:
(3S E/5)(3cr E R)[ a E T& e : ~b --~ ~b &/St- ~ & {~} U ( / 5- {S}) F- S]
The meani ng of this formula can be explained as follows: a paragraph P has been
translated into its formal version/5 and is to be examined for redundancy. Its partial
t heory PT(/5) has also been computed. The test will t urn positive if, for some sentence
S, we can find a r ul e/ t heor em a = ~ --* ~ in PT(P) such that the sentence S is implied
(in a classical logic) by the other sentences and ~. For example, if the paragraph
about Black Death were to contain also the sentence The ship carried people or goods, or
199
Quant i t y. Say nei t her t oo muc h nor t oo little.
Quality. Tr y t o ma ke y o u r cont r i but i on one t hat is t rue.
Rel at i on. Be r el evant .
Manner . Avoi d obs cur i t y a nd ambi gui t y; be br i ef a nd orderl y.
Figure 4
The Gricean maxims
both, whi ch (in its l ogi cal f or m) bel ongs t o R, it wo u l d be r edundant : ~ = (shl ), t here.
Similarly, t he def i ni t i on t akes care of t he r e d u n d a n c y r esul t i ng f r om a si mpl e r epet i t i on.
Metarule Gl a
(nonredundancy) If T1, T2 C PT( P) a nd T1 is less r e d u n d a n t t han T2, t hen t he t he or y
T1 is pr ef er r ed t o T2. ( Wher e "l ess r e d u n d a n t " means t hat t he n u mb e r of r e d u n d a n t
sent ences i n T1 is smal l er t han in T2)
The r el evant hal f of t he Maxi m of Qua nt i t y has be e n expr es s ed by Gl a . Ho w
woul d we expr ess t he ot her maxi ms ? The "t oo l i t t l e" par t of t he first ma xi m mi ght
be r epr es ent ed as a pr ef er ence f or una mbi guous par t i al t heor i es. The s econd ma xi m
has been a s s ume d all t he t i me - - wh e n cons t r uct i ng par t i al t heor i es or model s , t he
sent ences of a pa r a gr a ph are a s s ume d t o be t rue. The Ma xi m of Ma nne r seems t o us
t o be mor e r el evant f or cr i t i qui ng t he st yl e of a wr i t t en pas s age or f or nat ur al l anguage
gener at i on; in t he case of t ext gener at i on, it can be cons t r ued as a r e qui r e me nt t hat
t he p r o d u c e d t ext be coher ent a nd cohesi ve.
We do not cl ai m t hat Gl a is t he best or uni que wa y of expr es s i ng t he r ul e " as s ume
t hat t he wr i t er di d not say t oo much. " Rat her, we st ress t he possi bi l i t y t hat one can
axi omat i ze and pr oduc t i ve l y us e such a rul e. We shal l see t hi s i n t he next exampl e:
t wo sent ences, r egar ded as a f r agment of par agr aph, ar e a var i at i on on a t he me by
Hobbs (1979).
Example 3
The captain is worried because the third officer can open his safe. He knows the combination.
The above met ar ul e pos t ul at i ng " n o n r e d u n d a n c y " i mpl i es t hat " he" = "t he t hi r d
officer, . . . . hi s" = "t he capt ai n' s " ar e t he r ef er ent s of t he pr onouns . Thi s is becaus e t he
f or mul a
safe(x) --, (owns(y~ x) & cmbntn(z~ x) --, knows(y~ z) & can_open(y~ x) ) E Tsafe,
bel ongs t o R, si nce it is c o mmo n knowl e dge about safes t hat t hey ha ve owner s , a nd
al so combi nat i ons t hat are k n o wn t o t he owner s. Ther ef or e "hi s" = "t he t hi r d of f i cer ' s"
wo u l d pr oduc e a r e d u n d a n t f or mul a, c or r e s pondi ng t o t he sent ence The third officer
can open the third officer's safe. By t he s ame t oken, The captain knows the combination
woul d be r e dunda nt t oo.
200
We now explain the details of this reasoning. One first proves that "his" = "the
captain' s." Indeed, if "his" = "the third officer' s," then our example sentence woul d
mean
? The captain is worried because the third officer can open the third officer's safe;
in logic:
captain(x) & worry( x, s) & sentence(s)
& s = ' t he t hi rd offi cer can ope n t he t hi rd officer' s safe."
We assume also, based on common knowl edge about worrying, that worry(x' , s' ) -~
S. That is, one worries about things that mi ght possibly be or become true (S denotes
the logical formul a corresponding to the sentence s, cf. Section 3); but one doesn' t
worry about things t hat are accepted as (almost) al ways true (such as the law of
gravity), so t hat worry(x' , 's') ~ -~(S E Tf), where f ranges over subformulas of S.
In our case, S i mmedi at el y follows from Tsafe and X, where X = safe(sf) & third_offi-
cer(o) & owns(o, sf)--t he fact that "the third officer can open the third officer' s safe" is
a consequence of general knowl edge about the ownership of safes. And therefore the
interpretation wi t h "his" = "the captain' s" is preferred as less r edundant by the rule
Gl a. This t heory contains representations of the t wo sentences, the t heory of safes, a
t heory of worrying, and the equality "his" = "captain' s."
It remains to prove that "he" = "the third officer." Otherwise we have
P1. The captain is worried because the third officer can open the captain's safe.
P2. ? The captain knows the combination.
Clearly, the last sentence is true but r edundant - - t he t heory of "safe" and P1 entail P2:
{P1} U Tsafe t- P2
We are left wi t h the combination
Q1. The captain is worried because the third officer can open the captain's safe.
Q2. ? The third officer knows the combination.
In this case, Q2 does not follow from {Q1} u Tsafe and therefore Q1, Q2 is preferred to
P1, P2 (by Gla). We obtain t hen
The captain is worried because the third officer can open the captain's safe.
The third officer knows the combination
as the most plausible interpretation of our example sentences.
Not e: The reader must have noticed that we did not bother to distinguish the
sentences P1, P2, Q1 and Q2 from their logical forms. Representing "because" and
"know" adequat el y shoul d be considered a separate topic; representing the rest (in
the first order convention of this paper) is trivial.
6. 1. 1 Was t he Us e of a Gri cean Maxi m Neces s ary? Can one deal effectively wi t h the
problem of reference wi t hout axiomatized Gricean maxims, for instance by using onl y
201
"pet t y conversational implicature" (Hobbs 1979), or the metarules of Section 5.2? It
seems to us that the answer is no.
As a case in point, consider the process of finding the antecedent of the anaphor
"he" in the sentences
John can open Bill's safe. He knows the combination.
Hobbs (1979, 1982) proves "he" = "John" by assumi ng the relation of "elaboration"
between the sentences. (Elaboration is a relation bet ween t wo segments of a text. It
intuitively means "expressing the same t hought from a different perspective," but has
been defined formally as the existence of a proposition implied by bot h segment s- -
here the proposition is "John can open the safe".) However, if we change the pair to
the triple
Bill has a safe under the painting of his yacht. John can open Bill's safe. He knows
the combination
the relation of elaboration holds bet ween the segment consisting of the first t wo sen-
tences of the triple and each of the two possible readings: John knows the combination
and Bill knows the combination. In this case, elaboration cannot choose the correct ref-
erent, but the rule Gl a can and does. Clearly, an elaboration shoul d not degenerate
into redundancy; the Gricean maxi ms are to keep it fresh.
As we have observed, correct interpretations cannot be chosen by an interaction of
an object level t heory and a referential level alone, because coherence, plausibility and
consistency are too weak to weed out wrong partial theories. Metarules are necessary.
True, the captain knew the combination, but it was consistent that "his" mi ght have
referred to "the third officer' s."
6.2 Semantics of the Conjuncti on "But"
Any analysis of nat ural l anguage text, to be useful for a comput at i onal system, will
have to deal wi t h coherence, anaphora, and connectives. We have exami ned so far
the first t wo concepts; we shall present now our view of connectives to complete the
argument about paragraphs being counterparts of models. We present a metalevel rule
that governs the behavior of the conjunction "but;" we formalize the manner in whi ch
"but " carries out the contradiction. Then we derive from it two rules t hat prevent
infelicitous uses of "but. "
Connectives are function wor ds- - l i ke conjunctions and some adver bs- - t hat are
responsible si mul t aneousl y for mai nt ai ni ng cohesiveness wi t hi n the text and for sig-
naling the nat ure of the relationships t hat hol d bet ween and among various text units.
"And, " "or," and "but " are the three mai n coordinating connectives in English. How-
ever, "but " does not behave quite like the other t wo--semant i cal l y, "but " signals a
contradiction, and in this role it seems to have three subfunctions:
.
.
Opposition (called "adversative" or "contrary-to-expectation" by
Hal l i day and Hasan 1976; cf. also Quirk et al. 1972, p. 672).
The ship arrived but the passengers could not get off.
The yacht is cheap but elegant.
Comparison. In this function, the first conjunct is not so directly
contradicted by the second. A contradiction exists, but we may have to
202
go t hrough addi t i onal levels of implication to find it. Consi der the
sentence:
.
That basketball player is short, but he' s very quick.
Affirmation. This use of "but " always follows a negative clause, and
actually augment s the meani ng of the preceding clause by addi ng
support i ng information:
The disease not onl y killed t housands of people, but also ended a period
of economic welfare.
In this section we consider only the first, or adversative, function of the coordinating
conjunction "but. "
6.2.1 The Semantic Function of " But " "But" introduces an element of surprise into
discourse. Because it expresses some kind of contradiction, "but " has no role in the
propositional calculus equivalent to the roles filled by "and" and "or." Al t hough there
are logical formation rules using the conjunction operator ("and") and the disjunction
operator ("or"), there is no "but" operator. What, then, is the semantic role of "but"?
We believe that its function should be described at the metalevel as one of many rules
gui di ng the construction of partial theories. This is expressed below.
Metarule ( BUT)
The formulas but k~, ~' but ~1 . . . . of a (formal representation of) paragraph P are
to be interpreted as follows:
In the construction of any T E PT( P) instead of taking @f to be the uni on U
of rr --* T~, take the uni on of ~ --* T/ { k~, ~ ' , . . . } .
The symbol cr ---* T~, / { qt , d' ,...} denotes a maximal consistent wi t h {or, ~, k~' ,...} sub-
t heory of r~ --* T, and in general T / T t will be a maximal consistent wi t h T' subt heory
of T.
"But" is then an order to delete from background information everyt hi ng contra-
dicting , but to use what remains. Notice that "and" does not have this meaning; a
model for and will not contain any part of a t heory that contradicts either of the
clauses or d.
Typically this rule will be used to override defaults, to say that the expected
consequences of the first conjunct hold except for the fact expressed by the second
conjunct; for instance: We were comi ng to see you, but i t rained (so we didn' t). The rule
BUT is supposed to capture the "contrary-to-expectation" function of "but. "
We present now a simple example of building a model of a one-sentence paragraph
containing "but. " We will use this example to explain how the rule BUT can be used.
Using background information presented below, we will construct a partial model for
this one-sentence paragraph.
203
Example 4
Thi s yacht is cheap, but i t is el egant .
Referential level (a fragment)
cheap( x) --, { ~e l e gant ( x ) ; poor_ qual i t y( x) ; - ~expensi ve} (cl)
expens i ve( x) ~ -~cheap( x ) (encl)
y ac ht ( x ) ~ { s hi p( x ) & s mal l ( x ) } (yl )
el egant ( x) --, { ~cheap( x ) ~ . . . } (el)
el egant ( x) & y ac ht ( x ) --, { s t a t u s - s y mb o l ( x ) } (e_yl)
Note: Compar e (yl ) wi t h (cl); in (yl ) smallness is a pr oper t y of a ship; this woul d
be more precisely expressed as y ac ht ( x ) --* [ ship(x); p r o p e r t y : smal l ( x) ]. This trick al-
l ows us not to concl ude that "a bi g ant is big," or "a small el ephant is small."
We ignore the pr obl em of mul t i pl e meani ngs (theories) of predicates, and assume
the trivial ordering in whi ch all formul as are equal l y preferred. (But not e that (e_yl)
is still preferred to (el) as a more specific t heory of "elegant;" cf. Section 3.)
Construction of the model: In our case ___ yacht ( yo) & cheap(yo) and =__
el egant ( yo ). Then
Tcheap -= { -~el egant (x); poor_ qual i t y( x) ; -~expensi ve(x) }
Tyacht = { s hi p( x ) & s ma l l ( x ) }
Telcgant = { e x pe ns i v e ( x ) }
Tyacht & elegant = { s t at us _ s y mbol ( x ) }
Tcheap/~d = { poor - qual i t y( x) ; - ~expensi ve( x) }
T yach t/' = T yach t
Tyacht&elegant / @ = Tyacht&elegant
We can now use the Met arul e (BUT) and const ruct the partial theories of the sentence.
In this case, there is onl y one:
P T ( ~ but ~) = {T}~ wher e
T = { yacht (yo)~ el egant ( yo) ~cheap( yo) ~shi p( yo) & small(yo)~
s t at us - s y mbol (Y o )~ poor_ qual i t y( yo ) }
In other words, the yacht in quest i on is a poor quality small and elegant shi p serving
as an inexpensive st at us symbol.
The partial model of the t heory T is qui t e trivial: it consists of one entity repre-
senting the yacht and of a bunch of its attributes. However , the size of the model is
not i mport ant here; what count s is the met hod of deri vat i on of the partial theory.
6.2.2 Confirming the Analysis. The Met arul e (BUT) is s uppos ed to capt ure the "con-
trary-to-expectation" function of "but." The next t wo rules fol l ow from our formaliza-
204
tion; their validity indirectly supports the plausibility of our analysis of "but. "
BUT_C1 : ff bu t - ~ is incorrect, if --* is a "law."
e.g. Henr y was murdered but not killed.
Our referential level is a collection of partially ordered theories; we have expressed
the fact that a t heory of some is a "l aw" is by deleting the empt y interpretation of
from the partial order. If we accept the definition of a concept as given by necessary
and sufficient conditions, the theories woul d all appear as laws. If we subscribe to a
more realistic view where definitions are given by a collection of cent ral / prot ot ypi cal
and peripheral conditions, onl y the peripheral ones can be contradicted by "but. " In
either formalization we get BUT_C1 as a consequence: Since "laws" cannot be deleted,
BUT can' t be applied, and hence its use in those kinds of sentences woul d be incorrect.
W. Labov (1973) discussed sentences of the form
, Thi s is a chair but you can sit on it.
The sentence is incorrect, since the function "one can sit on it" belongs to the core of
the concept "chair"; so--cont rary to the role of "but "- - t he sentence does not contain
any surprising new elements. Using the Metarule (BUT) and the cooperative principle
of Grice, we get
BUT_C2: but is incorrect, if -* k~ is a "law."
The Metarule (BUT) gives the semantics of "but;" the rules BUT_C1 and BUT_C2 follow
from it (after formalization in a sufficiently strong rnetalanguage such as t ype t heory
or set theory). We can link all of t hem to procedures for constructing and evaluating
models of text. Are t hey sufficient? Certainly not. We have not dealt wi t h the other
usages of '%ut;" neither have we shown how to deal wi t h the apparent asymmet r y
of conclusions: cheap but elegant seems to i mpl y "wort h buyi ng, " but elegant but cheap
doesn' t; we have ignored possible prototypical effects in our semantics. However, we
do believe that other rules, dealing wi t h "but " or wi t h other connectives, can be conve-
niently expressed in our framework. (The mai n idea is that one shoul d talk explicitly
and formal l y about relations between text and background knowl edge, and that this
knowl edge is more t han just a list of fact s--i t has structure, and it is ambiguous.)
Furthermore, the semantics of "but" as described above is comput at i onal l y tractable.
We also believe that one could not present a similarly nat ural account of the se-
mantics of "but " in traditional logics, because classical logics wi t hst and contradictions
wi t h great difficulty. Contradiction, however, is precisely what "but " expresses. No-
tice that certain types of coordinating conjunctions often have their counterparts in
classical logic: copulative (and, or, neither-nor, besides, sometimes etc.), disjunctive (like
either-or), illative (hence, for that reason). Others, such as explanatory (namely or viz.)
or causal (for) conjunctions can probably be expressed somehow, for better or worse,
wi t hi n a classical framework. Thus the class of adversative conjunctions (but, still, and
yet, nevertheless) is in that sense unique.
7. Concl u s i ons
We hope that the reader has found this paper coherent, and its t opi c--t he correspon-
dence between paragraphs and model s--i nt erest i ng. Our strategy was to divide the
205
subject into three subtopics: a t heor y of anaphora, whi ch corresponds to the logical
theory of equality in p-model s; a t heor y of backgr ound knowl edge, expressed as the
logical theory of reference in the three-level semantics; and pri nci pl es of communi cat i on
encoded in metarules. These principles include Gricean maxi ms and the semantics of
cohesion, and specify a model theory for the three-level semantics. The f r amewor k re-
sulting from put t i ng these theories together is comput at i onal , empirical, and verifiable
(even if incomplete); furthermore, it has strong links to al ready existing nat ural lan-
guage processing systems. In particular, the new logical l evel - - t he referential l evel --i s
exemplified by on-line dictionaries and ot her reference works, from whi ch we extract
background information about defaults and plausibility rankings.
We also hope that the reader woul d be inclined to share our belief that nat ural
l anguage text can be pr oper l y and useful l y anal yzed by means of a three-level seman-
tics that includes an object level, a metalevel, and a referential level. We bel i eve that
the coherence of an essay, a paper, or a book can be descri bed by an extension of our
theory. The wor k of van Dijk and Kintch (1983) on "macrost ruct ures" coul d pr obabl y
form the basis for such an expansion. Similarly, much of the abovement i oned wor k
by Hobbs, Webber, Grosz, and Sidner can be i ncorporat ed into this framework.
Our mai n intention was to demonst rat e that formal not i ons of backgr ound knowl -
edge can be used to
define coherence, make it semantically distinct from mere consistency,
and link it formally wi t h the not i on of a topic;
define a class of p- model s- - l ogi cal model s of paragraphs;
pr ovi de a semantics for "but " (whi ch exemplifies our under st andi ng of
grammatical cohesion);
express the Gricean maxi ms formally, and use t hem in a comput at i onal
model of communi cat i on (which seems to contradict Allen 1988, p. 464).
Moreover, we tried to convince the reader that paragraph is an i mport ant linguistic
unit, not onl y because of its pragmat i c i mport ance exemplified by coherence and links
to backgr ound knowl edge, but also because of its role in assi gnment of syntactic
structures (viz. ellipsis) and in semantics (viz. its possible role in evaluating semantic
representations).
A great many issues have been omi t t ed from our analysis. Thus, al t hough we are
aware that anaphora resol ut i on and consistency depend on previ ousl y processed text,
the pr obl em of connecting a par agr aph to such text has been conveni ent l y ignored.
Not i ce that this doesn' t make our thesis about paragraphs bei ng units of semantic
processing any weaker, we have not claimed that paragraphs are i ndependent . The
quest i ons of how to translate from nat ural l anguage to a logical not at i on needs a lot
of attention; we have merel y assumed that this can be done. Cont i nui ng this list, we
have accept ed a very classical t heory of meaning, given by Tarski: the truth is what
is satisfied in a model. This t heory shoul d be refined, for instance by formalizing
Lakoff' s (1987) concept of radial categories, and proposi ng mechani sms for exploiting
it. By the same token, the concept of reference has to be br oadened to i ncl ude iconic
(e.g., vi sual and tactile) information. And certainly it woul d be .nice to have a more
detailed t heory describing the role of the metalevel. In particular, we can imagine
that the simple structure of a collection of set theoretic formul as can be replaced by
somet hi ng more interesting. We leave this as anot her open problem.
We have shown that it is possi bl e to devel op a formal syst em wi t h an explicit
relationship bet ween backgr ound knowl edge and text, showi ng mechani sms that t ake
206
advant age of preference, coherence, and contradiction (the reality of these phenomena
has never been disputed, but their semantic functions had not been investigated). We
shoul d also ment i on that we have also begun some work on actually checking the
empirical validity of this model (cf. e.g. Braden-Harder and Zadr ozny 1990), using on-
line dictionaries (LDOCE and Webster' s) as the referential level. We know, of course,
that existing dictionaries are very imperfect, but (1) t hey can be accessed and used
by comput er programs; (2) t hey are getting better, as t hey are very systematically
created wi t h help of computers (see Sinclair 1987 for an account of how COBUILD
was constructed); (3) obviously, we can imagine useful new ones, like a Dictionary of
Pragmatics; (4) we believe that we can explore the new inference mechani sms even in
such unrefined environments.
Al t hough the scheme we have proposed certainly needs further refinement, we
believe that it is correct in two of its most i mport ant aspects: first, in the separation
of current paragraph analysis (the object theory) from background information (the
referential level); and second, in asserting that the function of a paragraph is to allow
the bui l di ng of a model of the situation described in the paragraph. This model can be
stored, maybe modified, and subsequently used as a reference for processing following
paragraphs.
Finally, the paper can be vi ewed as an argument that the meani ng of a sentence
cannot be defined outside of a context, just as the t rut h value of a formul a cannot be
comput ed in a vacuum. A paragraph is the smallest example of such a cont ext --i t is
"a uni t of t hought . "
Acknowledgments
We gratefully acknowledge the help
provided by our anonymous but diligent
referees, by Graeme Hirst and Susan
McRoy, and by our many colleagues at IBM
Research, all for whom helped us to avoid
worst consequences of Murphy's Law
during the writing of this paper.
References
Allen, J. (1987). Natural Language
Understanding. Benjamin/Cummings.
Menlo Park, California.
Berwick, R.C. (1986). "Learning from
Positive-Only Examples: The Subset
Principle and Three Case Studies." In
Michalski, R.S. et al., eds. Machine
Learning Vol. I I . Morgan Kaufmann. Los
Altos, California: 625-645.
Black, J.B., and Bower, G.H. (1979).
"Episodes as Chunks in Narrative
Memory." Journal of Verbal Learning and
Verbal Behavior 18: 309-318.
Bond, S.J., and Hayes, J.R. (1983). Cues
People Use to Paragraph Text. manuscript.
Dept. of Psychology, Carnegie-Mellon
University.
Brachman, R.J., Fikes, R.E., and Levesque,
H.J. (1985). "Krypton: A Functional
Approach to Knowledge Representation."
In Brachman, R.J., and Levesque, H.J.,
eds. Readings in Knowledge Representation.
Morgan Kaufmann. Los Altos, California:
411-430.
Braden-Harder, L., and Zadrozny, W. (1989).
Lexicons for Broad Coverage Semantics. IBM
Research Division Report, RC 15568.
Chafe, W.L. (1979). "The Flow of Thought
and the Flow of Language." In Givon, T.,
ed., Syntax and Semantics, Vol. 12.
Academic Press. New York, New York.
Charniak, E. (1983). "Passing Markers: A
Theory of Contextual Influence in
Language Comprehension." Cognitive
Science, 7(3): 171-190.
Charniak, E., and McDermott, D. (1985).
I ntroduction to Artificial I ntelligence.
Addison-Wesley. Reading, Massachusetts.
Crothers, E.J. (1979). Paragraph Structure
I nference. Ablex Publishing Corp.,
Norwood, New Jersey.
Etherington, D.W., and Mercer, R.E. (1987).
"Domain Circumscription: A
Reevaluation." Computational I ntelligence
3: 94--99.
Frisch, A. (1987). "Inference without
Chaining." Proc. I JCAI -87. Milan, Italy:
515-519.
Fowler, H.W. (1965). A Dictionary of Modern
English Usage. Oxford University Press.
New York, New York.
Genesereth, M.R., and Nilsson, N.J. (1987).
Logical Foundations of Artificial I ntelligence.
Morgan Kaufmann. Los Altos, California.
Grice, H.P. (1978). "Further Notes on Logic
207
and Conversation." In Cole, P., ed. Syntax
and Semantics 9: Pragmatics. Academic
Press. New York, New York: 113-128.
Grice, H.P. (1975). "Logic and
Conversation." In Cole, P., and Morgan,
J.L., eds., Syntax and Semantics 3: Speech
Acts. Academic Press. New York New
York: 41-58.
Groesser, A.C. (1981). Prose Comprehension
Beyond the Word. Springer. New York,
New York.
Grosz, B.J. (1978). DISCOURSE
KNOWLEDGE--Section 4 of Walker, D.E.,
ed., Understanding Spoken Language.
North-Holland. New York, New York:
229-344.
Grosz, B.J. (1977). "The Representation and
Use of Focus in a System for
Understanding Dialogs." Proc. I JCAI -77.
W. Kaufmann. Los Altos, California:
67-76.
Haberlandt, K., Berian, C., and Sandson, J.
(1980). "The Episode Schema in Story
Processing." Journal of Verbal Learning and
Verbal Behavior 19: 635--650.
Halliday, M.A.K., and Hasan, R. (1976).
Cohesion in English. Longman Group Ltd.
London.
Haugeland, J. (1985). Artificial I ntelligence:
The V ery I dea. MIT Press. Cambridge,
Massachusetts.
Hinds, J. (1979). "Organizational Patterns in
Discourse." In Givon, T., ed., Syntax and
Semantics, Vol. 12. Academic Press. New
York, New York.
Hintikka, J. (1985). The Game of Language.
D. Reidel. Dordrecht.
Hirst, G. (1987). Semantic I nterpretation and
the Resolution of Ambiguity. Cambridge
University Press. Cambridge.
Hobbs, J.R. (1982). "Towards an
Understanding of Coherence in
Discourse." In Lehnert, W.G., and Ringle,
M.H., eds. Strategies for Natural Language
Processing. Lawrence Erlbaum. Hillsdale,
New Jersey: 223-244.
Hobbs, J.R. (1979). "Coherence and
Coreference." Cognitive Science 3: 67-90.
Hobbs, J.R. (1978). "Resolving Pronoun
References." Lingua 44: 311-338.
Hobbs, J.R. (1977). 38 Examples of Elusive
Antecedents from Published Texts.
Research Report 77-2. Department of
Computer Science, City College, CUNY,
New York.
Hobbs, J.R. (1976). Pronoun Resolution.
Research Report 76-1. Dept. of Comput er
Science. City College, CUNY, New York.
Hobbs, J., Stickel, M., Martin, P., and
Edwards, D. (1988). "Interpretation as
Abduction." In Proc. of 26th Annual
Meeting of the Association for Computational
Linguistics, ACL: 95-103.
Jackendoff, R. (1983). Semantics and
Cognition. The MIT Press. Cambridge,
Massachusetts.
Jensen, K. (1988). "Issues in Parsing." In
A. Blaser, ed., Natural Language at the
Compute, r. Springer-Verlag, Berlin,
Germany: 65-83.
Jensen, K. (1986). Parsing Strategies in a
Broad-Coverage Grammar of English.
Research Report RC 12147. IBM
T.J. Watson Research Center, Yorktown
Heights, New York.
Jensen, K., and Binot, J.-L. (1988).
"Disambiguating Prepositional Phrase
Attachments by Using On-line Dictionary
Definitions." Computational Linguistics
13.3-4.251-260 (special issue on the
lexicon).
Johnson-Laird, P.N. (1983). Mental Models.
Harvard University Press. Cambridge,
Massachusetts.
Kamp, H. (1981). "A Theory of Truth and
Semantic Representation." In Groenendijk,
J.A.G., et al. eds., Formal Methods in the
Study of Language, I. Mathematisch
Centrum, Amsterdam: 277-322.
Labov, W. (1973). "The Boundaries of Words
and Their Meanings." In Fishman, J., New
Ways of Analyzing Variation in English.
Georgetown U. Press. Washington, D.C.:
340-373.
Lakoff, G. (1987). Women, Fire and Dangerous
Things. The University of Chicago Press.
Chicago, Illinois.
Levesque, H.J. (1984). "A Logic of Implicit
and Explicit Beliefs." Proc. AAAI -84.
AAAI: 198-202.
Longacre, R.E. (1979). The Paragraph as a
Grammatical Unit. In Givon, T., ed.,
Syntax and Semantics, Vol. 12. Academic
Press. New York, New York.
Longman Dictionary of Contemporary English.
(1978). Longman Group Ltd., London.
Lyons, J. (1969). I ntroduction to Theoretical
Linguistics. Cambridge University Press.
Cambridge, England.
Mann, W.C., and Thompson, S.A. (1983).
Relational Propositions in Discourse.
Information Sciences Institute Research
Report 83-115.
Moens, M., and Steedman, M. (1987).
"Temporal Ontology in Natural
Language." Proc. 25th Annual Meeting of
the ACL. Stanford, California: 1-7.
Montague, R. (1970). "Universal Grammar."
Theoria 36: 373-398.
Mycielski, J. (1981). "Analysis without
Actual Infinity." Journal of Symbolic Logic,
46(3): 625-633.
208
Patel-Schneider, P.S. (1985). "A Decidable
First Order Logic for Knowledge
Representation." Proc. I JCAI -85. AAAI:
455-458.
Perlis, D. (1985). "Languages with Self
Reference I: Foundations." Artificial
I ntelligence 25: 301-322.
Poole, D. (1988). "A Logical Framework for
Default Reasoning." Artificial I ntelligence
36(1): 27-47.
Quirk" R., Greenbaum, S., Leech, G., and
Svartvik, J. (1972). A Grammar of
Contemporary English. Longman Group
Ltd. London.
Reggia, J.A. (1985). "Abductive Inference."
In Karma, K.N., ed., Expert Systems in
Government Symposfum. IEEE: 484-489.
Reiter, R. (1987). "A Theory of Diagnosis
from First Principles." Artificial
I ntelligence, 32(1): 57-95.
Sidner, C. (1983). "Focusing in the
Comprehension of Definite Anaphora." In
Brady, M., and Berwick" R., eds.,
Computational Models of Discourse, MIT
Press. Cambridge, Massachusetts:
330-367.
Sells, P. (1985). Lectures on Contemporary
Syntactic Theories. CSLI lecture notes;
no. 3.
Shoenfield, J.R. (1987). Mathematical Logic.
Addison-Wesley. New York" New York.
Sinclair, J.M., ed. (1987). Looking Up. An
account of the COBUI LD project. Collins
ELT, London.
Small, S.L., Cottrell, G.W., and Tanenhaus,
M.K., eds. (1988). Lexical Ambiguity
Resolution. Morgan Kaufmann. San Mateo,
California.
Turner, M. (1987). Death is the Mother of
Beauty. The University of Chicago Press.
Chicago, Illinois.
van Dijk, T.A., and Kintch, W. (1983).
Strategies of Discourse Comprehension.
Academic Press. Orlando, Florida.
Warriner, J.E. (1963). English Grammar and
Composition. Harcourt, Brace & World,
Inc., New York New York.
Webber, B. (1983). "So What Can We Talk
About Now?" In Brad~ M., and Berwick,
R., eds., Computational Models of Discourse.
MIT Press. Cambridge, Massachusetts:
147-154.
Webber, B. (1987). "The Interpretation of
Tense in Discourse." Proc. 25th Annual
Meeting of the ACL, ACL: 147-154.
Webster's Seventh New Collegiate Dictionary.
(1963). Merriam-Webster, Inc. Springfield,
Massachusetts.
Woods, W. (1987). "Don' t Blame the Tool."
Computational I ntelligence 3(3): 228-237.
Zadrozny, W. (1987a). "Intended Models,
Circumscription and Commonsense
Reasoning." Proc. I JCAI -87: 909-916.
Zadrozny, W. (1987b). "A Theory of Default
Reasoning." Proc. AAAI -87. Seattle,
Washington: 385-390.
Zadrozny, W. (unpublished). The Logic of
Abduction.
209

PDF Paragraph

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

PDF Paragraph

Hochgeladen von

Copyright:

Verfügbare Formate

Semantics of Paragraphs

Das könnte Ihnen auch gefallen