Sie sind auf Seite 1von 5

World Academy of Science, Engineering and Technology 5 2005

A Computational Model for Resolving Pronominal


Anaphora in Turkish Using Hobbs’ Naïve Algorithm
Pınar Tüfekçi and Yılmaz Kılıçaslan

The aim of this paper is to implement a system that is based


Abstract—In this paper we present a computational model for on Hobbs’ Naїve Algorithm for pronominal anaphora
pronominal anaphora resolution in Turkish. The model is based on resolution in Turkish. The system processes low level
Hobbs’ Naїve Algorithm [4, 5, 6], which exploits only the surface information by using syntactic knowledge to collect possible
syntax of sentences in a given text. antecedents of pronouns. Then the future work will be
determining the most plausible candidate by means of higher
Keywords—Anaphora Resolution, Pronoun Resolution, Syntax- level information by using semantic and pragmatic pieces of
based Algorithms, Naїve Algorithm.
knowledge. The relevant literature on pronoun interpretation
I. INTRODUCTION ([5], [8], [15]) showed that a success rate of 80% is feasible
when employing syntactic information alone for English.
A NAPHORIC dependence is a relation between two
linguistic expressions such that the interpretation of one,
called anaphora, is dependent on the interpretation of the
Again, as part of our future work we intend to compare
Turkish and English with respect to their rate of success.
To the best of our knowledge, [18]’s BABY-SIT is the sole
other, called antecedent. The problem of anaphora resolution computational work that is intended to deal with anaphora
is to find the antecedent(s) for every anaphora [7]. A model or resolution in Turkish, along with many other aspects of the
algorithm for carrying out such a resolution process will be an language [20]. [18] uses situation-theoretic tools and notions.
essential component of any speech or text understanding [20] is an another computational work that is based on
system intended to handle realistic discourse or text fragments Centering Theory to deal with pronominal anaphora resolution
satisfactorily [2]. in Turkish and it particularly exploits the findings arrived by
To speak more specifically, anaphora resolution, which applying this theory to Turkish.
most commonly appears as pronoun resolution, is the problem
of resolving references to other items in the discourse. These II. THE SYNTACTIC APPROACH
items are usually noun phrases representing objects in the real
world called referents but can also be verb phrases, whole A. Types of Anaphora
sentences or paragraphs. There are primarily three types of anaphora:
Anaphora resolution is classically recognized as a very - Pronominal: This is the most common type where a
difficult problem in Natural Language Processing [2, 12, 13]. referent is referred to by a pronoun.
Work on anaphora resolution in the open literature tends to - Definite noun phrase: The antecedent is referred to by a
fall into three domains: artificial intelligence (as a specialty of phrase of the form “<the><noun phrase>”.
computer science, including computational linguistics and - Quantifier/Ordinal: The anaphor is a quantifier such as
natural language processing), classical linguistics (as ‘one’ or an ordinal such as ‘first’[14].
distinguished from computational linguistics), and cognitive Pronominal anaphora are the most commonly encountered
psychology. Psychologists tend to be interested in this topic in general usage. This category includes three subclasses:
because of their interest in how the brain processes language. Personal, demonstrative and reflexive [2]. Pronominal
Linguists are interested in anaphora resolution simply because anaphora in English and Turkish are shown in Table I [21].
this is a classical problem in the field [2]. For our purposes we
are primarily interested in the AI/computational linguistics TABLE I. PRONOMINAL ANAPHORA
approach. We will only be concerned with computational Pronominal Anaphora in English Pronominal Anaphora in Turkish
approaches to pronominal anaphora resolution algorithm that Personal Demonstrative Reflexive Personal Demonstrative Reflexive
have been implemented on a computer in Prolog. he this himself o bu kendi
she that herself onu bunu kendisi
it these itself onun bunun kendim
his those themselves onlar bunlar kendin
Manuscript received March 31, 2005. her others onları bunları kendimiz
Pınar Tüfekçi is with the Electronics and Communication Engineering him onların bunların kendiniz
Department, Çorlu Faculty of Engineering, Trakya University, Tekirdağ, its şu kendileri
Turkey ( phone: + 90-282-652 94 75; fax: + 90-282-652 93 72; e-mail: they şunu
pinart@corlu.edu.tr ). them şunun
Yılmaz Kılıçaslan is with the Computer Engineering Department, Faculty of their şunlar
Engineering and Architecture, Trakya University, Edirne, Turkey şunları
(e-mail: yilmazk@trakya.edu.tr ). şunların

13
World Academy of Science, Engineering and Technology 5 2005

For the purpose of this study, we will narrow down the We translate sentence (1) from English to Turkish in four
scope of anaphoric phenomena and focus on a sub-problem of different forms as indicated in sentences (3), (4), (5) and (6).
anaphora resolution, namely, the resolution of 3rd person
singular pronominal anaphora to noun-phrase antecedents. (3) Mr. Smith bir şoför-üi o-nuni,k
Most algorithms in the literature resolve the pronouns ‘he’, Mr. Smith one driver-ACC s/he-GEN-3.SG
‘she’, ‘it’, ‘her’, ‘him’, ‘his’, ‘her’, ‘its’ in English whenever kamyon-u-n-da gör-dü.
they have an antecedent which is a noun phrase. The truck-POSS-3.SG-LOC see-PAST.
algorithm we offer in this study will resolve the pronouns ‘o’, ‘Mr.Smith saw a driver in his truck.’
‘onu’, ‘onun’, and ‘kendi’ in Turkish whenever they have an
antecedent which is a noun phrase. (4) Mr. Smithi bir şoför-ük Øi,k kamyon-u-n-da
Mr.Smith one driver-ACC truck-POSS-3.SG-LOC
B. The Naїve Algorithm
gör-dü.
In his 1977 paper, Hobbs presents two algorithms of
see-PAST.
pronominal anaphora resolution: - a syntax-based algorithm,
‘Mr.Smith saw a driver in (his) truck.’
known as the Naїve Algorithm, and a semantic algorithm. We
will concentrate on the Naїve Algorithm for finding (5) Mr. Smithi o-nunk kamyon-u-n-da
antecedents of pronouns here. Mr. Smith s/he-GEN-3.SG truck-POSS-3.SG-LOC
The Naїve Algorithm consists of a single resolution
procedure based on traversing full parse trees starting from the bir şoför gör-dü.
pronoun in a search for an appropriate antecedent. The one driver see-PAST.
‘Mr.Smith saw a driver in his truck.’
algorithm assumes that the data is presented in the format of
parse trees produced by a particular grammar- namely, the one
(6) Mr. Smithi Øi kamyon-u-n-da bir şoför
where an NP node dominates an N-bar node, to which Mr.Smith truck-POSS-3.SG-LOC one driver
arguments of the head noun attach. The algorithm traverses
the tree, from the pronoun up, stopping on certain S, NP and gör-dü.
VP nodes, searching left-to-right breadth-first in the subtrees see-PAST.
dominated by these nodes. ‘Mr.Smith saw a driver in (his) truck.’
It will be necessary to assume that an NP node has an Nbar In sentences (3), (4), (5) and (6) there are some ambiguous
node below it, as proposed by Chomsky [1], to which a states. Let us look at them one by one:
prepositional phrase containing an argument of the head noun In sentence (3) “onun” may be co-referential with “şoför”
may be attached. Truly adjunctive prepositional phrases are or another person in the previous sentences as the parse tree of
attached to the NP node in English. This assumption, or (3) shows in Fig. 2. The syntactic tree structures of Turkish
something equivalent to it, is necessary to distinguish between which are used in this study are based on [11, 9].
sentences (1) and (2) in English [6]. It is worth noting that Previous Sentence S1
where English has a prepositional phrase we use an NP which
has a locative case in Turkish. NP VP2

Nbar NP2_acc Antecedent VP1


(1) Mr. Smithi saw a driverk in hisi,k truck.
(2) Mr. Smithi saw a driver of hisi truck. noun det Nbar NP1_loc VP
Mr. Smith bir
noun pronoun NP_loc verb
In sentence (1) ‘his’ may refer to Mr. Smith or the driver, şoförü onun gördü.
Anaphora noun
but in sentence (2) it may not refer to the driver. The kamyonunda
structures for the relevant noun phareses in sentences (1) and Fig. 2. The illustration of the parse tree of sentence (3) and
(2) are shown in Fig. 1. the algorithm working on it.
NP NP
The subject of the possessive NP can be null in Turkish
det Nbar PP det Nbar [19]. In sentence (4) there is a null pronoun just before the
a driver a
object “kamyonunda” and it may be co-referential with “Mr.
in NP driver PP
Smith” or “şoför”. This null pronoun behaves either like the
det Nbar of NP genitive-3 singular pronoun, “onun”, or like the reflexive
truck pronoun, “kendi”, when the NP has a possessive-3 singular
NP 's det Nbar
noun. If the null pronoun behaves like a GEN.3.SG pronoun,
he truck
NP 's it is interpreted as co-referential with “şoför”. If the null
he pronoun behaves like a reflexive pronoun, it is interpreted as
Fig. 1. The structures for NPs of sentences (1) and (2). co-referential with “Mr. Smith” as the parse tree of sentence
(4) shows in Fig. 3.

14
World Academy of Science, Engineering and Technology 5 2005

Previous Sentence S1 sentence (7b). An overt genitive pronoun forces disjoint


reference irrespective of whether the antecedent precedes or
NP3 Antecedent-2 VP2 follows the pronoun, as shown in sentences (7c) and (7d):

Nbar NP2_acc Antecedent-1 VP1 (7) a. Ahmeti [Øi anne-si-n-i] sev-er.


Ahmet mother-POSS-3.SG-ACC love-AOR.
noun det Nbar NP1_loc VP ‘Ahmet loves (his) mother.’
Mr. Smith bir
noun null_pronoun NP_loc verb
şoförü 'onun' gördü. b. Ahmeti [kendii anne-si-n-i] sev-er.
or noun Ahmet self/own mother-POSS-3.SG-ACC loveAOR.
'kendi' kamyonunda ‘Ahmet loves own mother.’
Anaphora

c. Ahmeti [o-nunk anne-si-n-i]


Fig. 3. The illustration of the parse tree of sentence (4) and Ahmet he-GEN-3.SG mother-POSS-3.SG-ACC
the algorithm working on it.
sev-er.
In sentence (5) “onun” may be co-referential with another love-AOR.
phrase in the previous sentences, as the parse tree of sentence ‘Ahmet loves his mother.’
(5) shows in Fig. 4.
d. [O-nunk anne-si-n-i] sev-er
Previous Sentence S1 He-GEN-3.SG mother-POSS-3.SG-ACC love-AOR
NP VP1 Ahmeti.
Ahmet.
Nbar NP1_loc VP ‘Ahmet loves his mother.’
noun pronoun NP_loc NP_nom verb
Mr. Smith onun gördü. In our opinion, if there is no accusative NP node preceding
Anaphora Nbar det Nbar a possessive NP which has a null pronoun, the null pronoun is
bir
noun noun used just like the reflexive pronoun “kendi” as in sentence (6).
kamyonunda şoför This reflexive pronoun co-refers with the subject of the
sentence as in sentences (6) and (7a). If there is an accusative
Fig. 4. The illustration of the parse tree of sentence (5) and NP preceding a possessive NP which has a null pronoun, the
the algorithm working on it. null pronoun is used like ‘kendi’ or ‘onun’ as in sentence (4).
For this reason, the null prononun may co-refer with the
In sentence (6) there is also a null pronoun just before the subject of the sentence, when ‘kendi’ is used. On the other
phrase “kamyonunda”. The null pronoun behaves like the hand, the null pronoun may co-refer with an accusative NP
reflexive pronoun “kendi” and, hence, it becomes co- preceding a possessive NP which has a null pronoun, when
referential with “Mr. Smith”. The parse tree of sentence (6) is ‘onun’ is used.
shown in Fig. 5.
S1 C. Reformulation of the Naїve Algorithm for Turkish
We have reformulated Hobbs Naïve Algorithm so that it
NP2 Antecedent VP1
can be applied to Turkish. We have incorporated some new
Nbar NP1_loc VP rules to the algorithm, as indicated below:
1. Begin at the NP node which immediately dominates a
noun null_pronoun NP_loc NP_nom verb
Mr. Smith kendi gördü.
pronoun (‘o’, ‘onu’, ‘onun’ or ‘kendi’) or a null
Anaphora Nbar det Nbar pronoun. If NP node immediately dominates a
bir pronoun, continue to step 3.
noun noun
kamyonunda şoför 2. Convert the null pronoun immediately dominated by
the NP node to the pronoun ‘onun’ and the pronoun
Fig. 5. The illustration of the parse tree of sentence (6) and ‘kendi’ and apply the rest of the algorithm for each of
the algorithm working on it. these conversions separately. Firstly, apply the
algorithm for ‘kendi’ and continue Step 4.
According to [19] and [3], the subject of a possessive NP 3. Secondly, apply the algorithm for ‘onun’ and continue
must be null when it is coreferential with the matrix subject, to step 4.
as in sentence (7a); if the possessive is informationally 4. Go up the tree to the first NP or VP node
focused, the reflexive pronoun kendi ‘own/self’ is used, as in encountered. Call this node X and call the path used to
reach it p.

15
World Academy of Science, Engineering and Technology 5 2005

5. If the pronoun is ‘kendi’, continue to step 8. (8) a. The castle in Camelot remained the residence of the
6. If X is an NP node, traverse all branches below node king until 536 when he moved it to London[6].
X to the left of path p in a left-to-right, breadth-first
fashion. Propose as the antecedent any accusative NP b. Camelot-ta-ki kale, kral-ın o-nu
node which is immediately dominated by X or Camelot-LOC-REL castle, king-GEN it-ACC
propose as the antecedent any accusative NP node Londra-ya taşı-dı-ğı 536-ya kadar,
that is encountered which has an NP, VP or S node Londra-DAT move-PAST-ACC 536-DAT until,
between it and X.
7. If X is an VP node, traverse all the other branches o-nun rezidans-ı kal-dı.
below node X except path p. Propose as the s/he-GEN-3.SG residence-ACC remain-PAST.
antecedent any accusative NP node which is
immediately dominated by X or propose as the Beginning from node NP1 which is immediately
antecedent any accusative or genitive NP node that is dominating the pronoun ‘onu’, step 3 rises to node NP2. Step
encountered which has an NP, VP or S node between 4 does not apply, because the pronoun is not ‘kendi’. It’s
it and X. passed from step 3 to step 5. Step 5 searches the left portion of
8. From node X go up the tree to the first NP, VP or S NP2’s tree but finds no eligible NP node. Step 6 does not
node encountered. Call this new node X, and the path apply. Step 7 rises to node NP3. It’s passed from step 7 to step
traversed to reach it p. If X is an NP node or a VP 4. Step 5 searches the left portion of NP3’s tree but finds no
node, continue to step 5. If X is an S node, continue to eligible NP node. Step 6 does not apply. Step 7 rises to node
step 9. VP1 and it’s passed from step 7 to step 4. Step 5 does not
9. If the pronoun is “kendi”, the antecedent is a apply, it’s passed to step 6. Step 6 searches the all branches
nominative or genitive case-marked NP preceding it. below node VP1 except path p and proposes NP4 as
If the pronoun is not “kendi”, continue to step 10. antecedent. NP4 correctly determines ‘rezidansı’ as the
10. If node X is the highest S node in the sentence, antecedent of ‘onu’, as shown in Fig. 6.
traverse the surface parse trees of previous sentences S

in the text in order of recency, the most recent first;


NP VP1
each tree is traversed in a left-to-right, breadth-first
manner, and when an NP node is encountered, it is NP_loc NP PP VP

proposed as the antecedent. If X is not the highest S Nbar Nbar NP3_dat post NP4_acc VP
node in the sentence, continue to step 11. noun noun NP2_nomV NP_dat
kadar
pronoun NP5_acc verb
11. From node X, go up the tree to the first NP, VP or S Camelot'taki kale onun kaldı
NP_gen NP1_nomV Nbar Nbar
node encountered. Call this new node X, and call the
path traversed to reach it p. Nbar pronoun NP_nomV noun noun
onu 536'ya rezidansı
12. If X is an NP node and if the path p to X did not pass noun Anaphora NP_dat NP_nomV Antecedent
through the Nbar node that X immediately dominates, kralın
Nbar Nbar
propose X as the antecedent.
noun noun
13. If X is an NP node and if the path p passed through Londra'ya taşıdığı
the N-bar node that X immediately dominates,
traverse all branches below node X to the left of path Fig. 6. The illustration of the parse tree of sentence (8b), the
p in a left-to-right, breadth-first manner. Propose any algorithm working on it and the determination of the
NP node encountered as the antecedent. antecedent of anaphora ‘onu’.
14. If X is a VP or S node, traverse all branches of node X
to the right of path p in left-to-right, breadth-first If we search for the antecedent of ‘onun’, beginning from
manner, but do not go below any NP or VP or S node node NP1 immediately dominating the pronoun ‘onun’, step 3
encountered. Propose any NP node encountered as the rises to node VP1. Step 4 does not apply, because the pronoun
antecedent. is not ‘kendi’. Step 5 does not apply and it’s passed from step
15. Go to step 10. 3 to step 6. Step 6 searches the all branches below node VP1
except path p. Firstly it’s proposed NP2 as antecedent in step
As [6] points out, a breadth-first search of a tree is one in 6. Thus, ‘536-ya’ is recommended as the antecedent of ‘onun’.
which every node of depth n is visited before any node of The algorithm can be improved somewhat by applying
depth n+1. simple selectional constraints, such as; Dates and places and
Figures 2, 3, 4 and 5 illustrate the algorithm working on the large fixed objects can’t move [6].
sentences (3), (4), (5) and (6). Figures 6 and 7 illustrate the After NP2 is rejected, it’s proposed NP3 as antecedent in
algorithm working on the sentence (8b) which is the step 6. And finally ligting upon NP3 ‘kralın’ as the antecedent
translation of the sentence (8a) from English to Turkish for of ‘onun’ in step 6 as shown in Fig. 7.
determining the antecedents of each anaphora.

16
World Academy of Science, Engineering and Technology 5 2005

S [11] Kılıçaslan, Y., “A Situation-Theoretic Approach to Case Marking


Semantics in Turkish”, Lingua , 2005.
NP VP2 [12] Mitkov, R., “Anaphora Resolution:The State of te Art,
COLING’98/ACL’98 tutorial on anaphora resolution, University of
NP_loc NP PP VP1
Wolverhampton,1999.
Nbar Nbar NP2_dat post NP1_acc VP [13] Mitkov, R., “Anaphora Resolution”, Pearson Education, ISBN 0 582
kadar 32505 6, 2002.
noun noun NP_nomV NP_dat pronoun NP_acc verb [14] Sayed, I,Q.,“Issues in Anaphora Resolution”,
Camelot'taki kale onun kaldı
NP3_gen NP_nomV Nbar Anaphora Nbar
http://www.ceng.metu.edu.tr/courses/ceng463/project/BurakAysegul/pro
ject_report.pdf
Nbar pronoun NP_nomV noun noun [15] Shalom, L. and Herbert, L., “ An algorithm for Pronominal Anaphora
onu 536'ya rezidansı
Resolution.”, Computational Linguistics 20(4): 535-561, 1993.
noun NP_dat NP_nomV
kralın [16] Tetrault, J., “Analysis of Syntax-based Pronoun Resolution Methods”. In
Antecedent Nbar Nbar Proceedings of the 37th Annual Meeting of the Association for
Computational Linguistics, pages 602-605, 1999.
noun noun
Londra'ya taşıdığı
[17] Tetrault, J., “A Corpus-based Evaluation of Centering and Pronoun
Resolution.” Computational Linguistics,2000.
[18] Tın E. and Akman V., “Situated Processing of Pronominal Anaphora”,
Fig. 7. The illustration of the parse tree of sentence (8b), the Bilkent University, Ankara, 1998.
algorithm working on it and the determination of the [19] Turan, Ü.D., “Null vs. Overt Subjects in Turkish Discourse : A
Centering Analysis”, Ph.D. Dissertation, 1996
antecedent of anaphora ‘onun’. [20] Yıldırım, S. and Kılıçaslan, Y. and Aykaç, R.E., “A Computational
Model for Anaphora Resolution in Turkish via Centering Theory: an
III. CONCLUSION Initial Approach.” , 124-128. ICCI 2004. ISBN 975-98458-1-4, 2004.
[21] Yüksel, Ö., “Contextually Appropriate Anaphor/Pronoun Generation for
We have implemented a version of the Hobbs’ Naive Turkish”, MSc. Thesis of The Middle East Technical University,1997.
Algorithm for Turkish by reformulating and incorporating
some new rules to the algorithm. For issues relating to
Turkish, we have rested upon the thematic hierarchy proposed
by [10, 20]. The algorithm so far has been tested successfully
on 10 toy sentences.
The idea we propose is to implement a system for pronoun
resolution that locates likely antecedents according to the
syntactic information. Then better models resulting from our
future work will be able to select the most suitable one
according to whether the corresponding logical form of the
sentence would be consistent with the axioms in semantic and
pragmatic.

REFERENCES
[1] Chomsky, N., “Remarks on nominalization.” In: R.Jacobs and
P.Rosenbaum(eds.), Readings in transformational grammar, 184-221.
Waltham, Mass.:Blaisdell, 1970.
[2] Denber M., “Automatic Resolution of Anaphora in English”, Technical
Report, Eastman Kodak Co. Imaging Science Division, June 30,1998;
http://www.wlv.ac.uk/~le1825/anaphora_resolution_papers/denber.ps
[3] Erguvanlı-Taylan E., “Pronominal versus Zero Representation of
Anaphora in Turkish”, Studies in Turkish Linguistics, 1986.
[4] Hobbs J.R., “Pronoun Resolution,”.Research Report 76-1, Department
of Computer Sciences, City College, City University of New York,
August 1976.
[5] Hobbs J.R., “38 Examples of Elusive Antecedents from Published
Texts,” Research Report 77-2, Department of Computer Sciences, City
College, City University of New York, August 1977.
[6] Hobbs J.R., “Resolving Pronoun References,” Lingua, Vol. 44, pp. 311-
338. Also in Readings in Natural Language Processing, B. Grosz, K.
Sparck-Jones, and B. Webber, editors, pp. 339-352, Morgan Kaufmann
Publishers, Los Altos, California, 1978.
[7] Huang Y., “Anaphora: A Cross-linguistic Approach,” New York:
Oxford University Press, 2000.
[8] Kennedy, C. and Boguraev, B., “Anaphora for everyone: Pronominal
Anaphora Resolution without a Parser.”, COLING 96 pages 113-118
(89), 1996.
[9] Kennelly, S.D., “Nonspecific External Arguments in Turkish”, Dilbilim
Araştırmaları 7, İstanbul, p.58-75, 1997.
[10] Kılıçaslan, Y., “Information packaging in Turkish.” Unpublished MSc.
Thesis, University of Edinburg, Edinburg, 1994.

17

Das könnte Ihnen auch gefallen