Beruflich Dokumente
Kultur Dokumente
8
Proceedings of the Conference on Language & Technology 2009
The annotation system uses regular expression @Anno1 is same as that of Sadler’s technique but
technique proposed by Sadler et al. [14] for the @Anno2 is added to define f-descriptions which
annotation. Pipeline model suggested by Cahill et al. does not involve variable at their right hand side. This
[2] is followed in parsing. Probabilistic model of part is optional.
Collins [5] is used to generate most probable parse Regular expression operators Kleene star *
tree. (without argument), optional (…), and disjunction | are
Parse Tree allowed at Rhs of the meta rules.
Symbols in Lhs and Rhs consist of two parts
separated by ‘:’, e.g. NP:n1. First part is the CFG
Annotation System symbol which should be present in the CFG rule
Grammar Extractor
whereas second part denotes a variable to which an f-
Meta
structure will be associated.
Rule Selector Syntax of annotation rules is given in Appendix
Rules
B. Following are some sample meta rules.
Rule Resolver
(3) VP:vp > * MD:m1 * [VP-A:v1|VP:v1] *
@[vp:^===m1,vp:^===v1].
LFG Generator
(4) S:s > * NP-A:n1 * VP:v1 *
@ [v1:SUBJ===n1,s:^===v1]
@ [v1:^CLAUSE_TYPE=DECLARATIVE] .
Corresponding LFG
Figure 1: Architecture of annotation
(5) NPB:npb > * (JJ:j1) NN:n1 *
system
9
Proceedings of the Conference on Language & Technology 2009
3.2. Grammar extractor Rule selector chooses applicable meta rules for
each grammar rule extracted from the tree on the basis
The Grammar Extraction module takes the of valid regular expression matching of Rhs. The
normalized output of the of Collins parser [5] and generic process for rule selection is as follows.
extracts the context free grammar rules from the parse
tree. Following is the example of a parse tree and the For each CFG rule
extracted grammar for the sentence “he ate apples with
me. For each meta rule whose LHS = CFG rule’s LHS
(9) Output of Collin’s Parser If (meta rule’s RHS matches CFG rule’s RHS)
10
Proceedings of the Conference on Language & Technology 2009
11
Proceedings of the Conference on Language & Technology 2009
erroneous. This way the system makes correct Such rules will be added to the system in the next
annotation 98.58% of the times. enhancement for increasing the accuracy of the
system. There is a chance of occurrence of more such
rules as the data observed to develop meta rules is
very limited.
Correct rule instances 1485 One problem encountered is incorrect
Incorrect rule instances 21 identification of SUBJ/OBJ in case of wh sentences.
% Correct 98.58%
(14) “what is your name”
This paper has reported precision as the total (TOP
number of correct annotations given the total (SBARQ
annotations our system made. Sadler has reported (WHNP what/WP )
93.38% precision of the system. The precision this (SQ
paper reports is 93.69, which appears almost equal to (VP is/VBZ
that of Sadler. (NP-A
(NPB your/PRPS name/NN ) ) ) ) ) )
5. Discussion
Pipeline model proposed by Cahill et al. [2] is (15) “what is he doing”
used by the system because it took less development (TOP
time and effort. An already existing PCFG model (SBARQ
developed by Collins [5] was used for the first phase (WHNP what/WP )
of CFG parsing and then annotation was done on the (SQ
most probable tree generated by the model. The meta is/VBZ
rules developed for the system are equally applicable (NP
to the integrated model of Cahill et al. [2] so the (NPB he/PRP ) )
system can be modified to use an integrated model in (VP doing/VBG ) ) ) )
future.
Reason for 7 wrong annotations was incorrect In sentence (14) “what” is the subject of the
parse tree generated by the parser. Such incorrect sentence whereas in (15) “what” is acting as object of
grammar rules were not handled in the meta rules. the sentence. But we can not detect it correctly
Such rules can be added to the system if they do not because of the use of the same CFG rule “SBARQ Æ
clash with already existing correct rules. For example, WHNP SQ” in both sentences.
the phrase “Armed with knowledge” is expected to be Currently, in case of multiple applicable meta
parsed as in (12). rules, one set of rules with maximum coverage is
selected arbitrarily. Such a selection may result in
(12) Correct parse of “Armed with knowledge” wrong analysis in some cases. Some probabilistic
(NPB Armed/VBN model can be added to the system for more accurate
(PP-A with/IN selection of rules. Only one occurrence of such wrong
(NP-A selection is observed during testing among 21 wrong
(NPB knowledge/NN) ) ) ) annotations.
The system is using a subset of regular expression
But the parser parsed it the following way operators. Kleene star ‘*’ with argument, positive
creating wrong CFG rule which was not handled in Kleene ‘+’, optionality ‘?’ and complement ‘~’ are the
meta rules. operators which are not used in rule writing.
Comparing with Cahill [2], long distance
(13) Incorrect parse of “Armed with knowledge” dependencies of traces are not intentionally used.
(PP Armed/VBN Traces and wh-movements are traced in the lower
(PP-A with/IN hierarchies of the tree which is not yet handled in this
(NP-A paper. The paper follows most of the Sadler’s work
(NPB knowledge/NN) ) ) ) which is not exactly meant to cover the traces of
movements in depth of tree. However, Frank [7] has
There were 12 meta rules which were found somehow presented the idea to resolve these depth
missing during testing that caused wrong annotation. dependencies.
12
Proceedings of the Conference on Language & Technology 2009
[1] J. Bresnan, Lexical Functional Syntax. Blackwells [13] P. Kroeger, , Phrase Structure and Grammatical
Publishers, Oxford, 2001. Relations in Tagalog. Stanford: CSLI, 1995.
[2] A. Cahill, McCarthy M., van Genabith J. and Way A. , [14] L. Sadler, Genabith J. and Way A. , “Automatic F-
“Parsing with PCFGs and Automatic F-Structure Structure Annotation from the AP Treebank”, in proceeding
Annotation”, in proceedings of the Seventh International of the Fifth International Conference on Lexical-Functional
Conference on LFG, pp.76-95,CSLI Publications, Stanford, Grammar, The University of California at Berkeley, CSLI
CA., 2002a. Publications, Stanford, CA, 2000.
[3] A. Cahill, McCarthy M., van Genabith J. and Way A., [15] J. van Genabith, Sadler L., and Way A. , “Data-Driven
“Automatic Annotation of the Penn-Treebank with LFG F- Compilation of LFG Semantic Forms” in EACL’99
Structure Information”, in Proceedings of the LREC Workshop on Linguistically Interpreted Corpora (LINC-99),
Workshop on Linguistic Knowledge Acquisition and pp: 69-76, Bergen, Norway, June 12th, 1999a.
Representation: Bootstrapping Annotated Language Data,
pages 8-15, Las Palmas, Canary Islands, Spain., 2002b. [16] J. van Genabith, , Sadler L., and Way A. , “Structure
Preserving CF-PSG Compaction, LFG and Treebanks” in
[4] A. Cahill, McCarthy M., van Genabith J. and Way A., Proceedings ATALA Workshop - Treebanks, Journ´ees
“Evaluating Automatic F-Structure Annotation for the Penn ATALA, Corpus annot´es pour la syntaxe, pp: 107-114 ,
II Treebank”, in Proceedings of the Treebanks and Universite Paris 7, France, 18-19 Juin 1999, , 1999b
Linguistic Theories (TLT’02) Workshop, Sozopol. Bulgaria,
2002b. [17] J. van Genabith, , Way A., and Sadler L. , “Semi-
Automatic Generation of F-Structures from Tree Banks” in
[5] M. Collins, Head-Driven Statistical Models for Natural M. Butt and T. King (Eds.), Proceedings of the LFG99
Language Parsing. PhD Dissertation, University of Conference, Manchester University, 19-21 July, CSLI
Pennsylvania, 1999. Online Publications, Stanford, CA, 1999c.
[6] M. Dalrymple , Lexical Functional Grammar. San Appendix A: Sample Tree and
Diego, Calif. London, Academic Press, 2001. Corresponding LFG Generated by the System
[7] A. Frank , “Automatic F-Structure Annotation of
Treebank Trees” in M. Butt and T.H. King editors,
“He said the tests confirmed Tehran had missiles with
proceedings of the LFG00 Conference, University of a limited range of up to 2,000km”
California at Berkeley, CSLI Online Publications, Stanford,
2000. Input Parse Tree:
[8] A. Frank., Sadler L., van Genabith J. and Way A. , (TOP~said~1~1
“From Treebank Resources to LFG F-Structures” in (S~said~2~2
(ed.)Anne Abeille, Treebanks: Building and Using (NP-A~He~1~1
Syntactically Annotated Corpora, pp:367-389, Kluwer
Academic Publishers, The Netherlands, 2002.
(NPB~He~1~1 He/PRP ) )
(VP~said~2~1 said/VBD
[9] R. Kaplan, Bresnan J. , “Lexical Functional Grammar: a (SBAR-A~confirmed~1~1
formal system for grammatical representation” in Bresnan, (S-A~confirmed~2~2
13
Proceedings of the Conference on Language & Technology 2009
14