Beruflich Dokumente
Kultur Dokumente
M e s s a g e s in a Limited D o m a i n
rThis work was sponsored by the Defense Advanced Re- 3Refer to (Kim, 1994) for other ongoing efforts in En-
search Projects Agency. Opinions, interi)retations, conclu- glish/Korean text translation im;luding (Choi, 1994). See
sions, and recommendations are those of the authors and (Lee, 1995) for speech translation work with Korean as the
are not necessarily endorsed by the United States Air Force. source language.
'~'vVe are also working on Korean-to-English text transla- 4Both modules are developed under AI{PA sponsorship
tion on the same domain, which we do not include in this by the Spoken I,anguage Systems Group at the MIT Labo-
paper. ratory for Comlmter Science.
705
COMMAND & CONTROL
[,AT* ~A~E ,/)i ......................................................................................................................................
]
@ . . . . .__o. L"oo-- I I i .--°. N
'I 3i_ ' 'J' 'L
COMMON COALITION ~i/~EMANTIC"~ I;CONFIRMAnON) j
L A N G U A G E N E T W O R K I i ~ _ ~ FRAME /] KOREAN
~ ,=,...c. ~J L...u..'~
"'°°°"'"°" I
LINDIEIlSTANDINr" " ~ lllgeOr-NmOK
TO/FROM
ENGLISH/CCL KOREAN~CCL
FRENCH/CCL
TRANSLATION 8 Y a T E M
..........TR a "S, ~A:P0,N, m 's TE,M,.................... TRANSLATION SYSTEM ........................................................................................................................
FRENCH
LANGUAGE LANGUAGE
NDERSTANDING GENERATION ~I~ble 1: S a m p l e English/Korean Bilingual Lexicon
(TINA- (GENESIS- be V i ROOT i PRES i PAST iess
- - I ..,LCSI, I MIT/LCS) V2 V .... [NG goiss PP ess PRES n PAST ess
cause_en V2 cholaytoy
visually AV sikakulo
cap_aircraft N centhwu cengchalki
MUC-II MUC-II
ANALYSISGRAMMAR GENERATIONGRAMMAR sentation makes the TINA system an ideal tool for machine
FORENGLISH FORKOREAN translation of various (i) purposes (i.e., whether a detailed or
rough translation is required), and (ii) languages (e.g., some
languages require a more elaborate tense representation or
F i g u r e 2: P r o c e s s F l o w of T e x t T r a n s l a t i o n honorification than others, and the appropriate categories
can be easily added.)
706
s t ~ r l t L,rlc~,
P~:EV
Full parse
statement
num~P tc ~rnt us~i ;hi 9 ~laat I ' v..btj q_5hlD v M, l t h q ~;h iF)
~h~ps r~h~p5
i
F i g u r e 3: P a r s e 'lS:ee
{c stotement
:time_expression {p numeric_time
:topic {q g~t
:pred {p numeric
:topic n19 } }
:topic {q ship_home
:hOme "sterett"
:pred {p ship_mud
:topic "uss" } }
:subject I
:pred {p token under_fire
:mode "pp"
:pred {P v_.by
:topic {q ship_nnme
:nome "kirov" } }
:pred {p v u i t h
:topic {q subnorine_nome
:none "ssn-12s" } } } }
0UI9 z uss s t e r e t t token under f i r e by kirov ~it h ssn-12s
PAIIAPHP~SE: UI9 z USS S t e r e t t token under f i r e by glrov uit h SSH-12s
T ~ S t m I O . : 8"I l ~ ~ *I~ "I~ v ~I~oI e l ~ - ~ l ~ I ~ ssn l e s ~
F i g u r e 4: S e m a n t i c F r a m e a n d q i ' a n s l a t i o n
707
T a b l e 3: _ T r a i n i n g D a t a (Data for System Develop- Table 4: C u r r e n t System Specifications
merit) _ Anal~-sisS i z e of , Size o f g r a m m a r
Data Set o.riginM Data Modified Data Total ' " ' " ?- '
i . lexicon (Ninnber.0f.categorie~ 0
, . '101 " 105 " 206 ,-1~27 ' ' •1297" ' 7-
.............. ;"". . . . . . . - I , . . 154= -. ..[ Generation ...997., : ,76.3' -'..
% ' ~ , t 0 t . , ....... 115...' i ~ • 215 .... '" : ' :"" "- "" ' " : - " .::",.-:: .... :
' : : ' '~: : ::~6] :~' ::" '::" ' ::".-" " :i' ~g,,=.~> ,.~.
Total 303 , .- 337 794 T a b l e 5: T r a n s i a l / i o n S tatistms tor ~amlng Data
[ # of Sentences
f r o m _a eaptur:ed airfield agginst~ c t f 177 nHits: [~-gi'!s'i/ihg .to2., :"M:gOaH-!fl~#-:dCtabg~Se"ida~: s.0t:: K?~,. g , h i e h r,cd~4ig<./no, a m :
w a r d a n e u t r M n a t ' o n1, ttumint sources ihdicated 12.sti'ike k n o w f i w o r d s ' i n a n i n - h o u s e e x l ; e i [ m e n t . 8; h { ~ h d e x p e r h n d n t ;
aircraft have launelied (1935z) enroutelo t£_e battle force, w e a s k e d t h e ; s u b j e c t s 'to s t u d ~ / a tist ofldata~ s e t R .MUC.-II
C T F 177 has positive confirmation t h a t the: battle force i s . s e n t e n d e s l a n d t h e n c r e a t e abo{ifi 10 M U C : I I - l i k e s c n t e n c e s o~i
,..q,o-o
~_'~,_ted ,( 20a5z, '~ Thzs,.~s
" " consider ' e"d a hbstile a c t t h e i r o w n S u b j d e t s w e r e . t o l d. t o . c r e a. t e s e h ~ e n c e s Whict~ il-
". . . . . . " - . . l u s t r a t e t h e . g e n e r a l .style 0f t h e M U C / ' I I s6fiterices a;nd w h i c h
M U C - I [ ' ( t a t ~ a]s0 .ha~d o t h e r ..typical f e a i n r e s 'of r a w ~ex~. o n l y u s e t h e ~voccb~l!~}~y i t e m s o q e n < r i n g i n . t h e . e x a m p l e S?.n-
T h e i ' e ~£re * i t d t e a~fe~"{.dst:ance~ of- e ~ m n t e x ~ e ~ ' t e ~ c e s ta~ : m n c e s " W/~ e01teeted: l a . 4 M U C c . l L l i k e . sb.ntefices iii t h i s d x p e r -
• "....... . .'-;i:, ,< ~;~,- "~ ~'a~v ",:~,:a d, ala:.~.,d.n~i;~,.~v~a ~ ~ , a : ~ 2 : ; . . . , ",lmen~:.-;W..{:~'h'erke~o,l u a t C d { t l e ~ y s ~ m S pm'forraa~ce on {lmse
' % i ........ , ..... ~ :::--'-',;'~, :, < :::.. . ~ . . . : ,.:': ......"7:;... :-'. :': - ....... : ::..=. ::,? s . e n u e n ~ S . ~ < ! , . ~ . ~ : . u m , m~g~& •. ~t~es ~ O r r t . E a ~ . S v a , 6 g e r : @ 7 ~.
5.,<::., .i ? ~'i•'. .:: ;, :% .--'>:" ":":<:---"- ~ :,".~ ...- ' : ; . . : v.. -:. ,<:~-,:;:-o{:tlie,K*:~eb~e<~::#e.,ti~:~e.//~g~e{.,g&t. dib:ba/~g;•.g-l'%;~
'•', . , < . :..~':2.,"2q,-2<'.! .. 7":.<~::i -:.: . ~ i : ) , ~ ,-:e, ,-:--.,:-: ,i.i, =.', : / i , - - : ~ { t , e ~ : g i ~ e o i ; ~ l ) . ~ i ; a ~ l & ~ ; d ~ p ;'=-.i?:9'.:"~'e.:. • :: ,: : :',:-; ,-;.
_(3) T~B- ~ S s - g m g t l e ~ 3 ~ e d , s t f i t ~ ~,es¢-~I-.fr~¢.~,~erg.e.ngag>id.~,...;; >5-:'..,.,; :,-,:' - '~:::,:~ . - - : . ~. :_. ,":,.=..".> :.-,:.:" ": ::~.~. :'-. ,: .•-:'." " ' , : :'..:.' ',~.
(~stand
" " -tin
- ' ..f~efl~ty)-
.<~, ~ • . . . . . . ~ . h l l '- e." e ~ f l .d ".', . . . . . .
. . .t m ~ t i ~n 'g %~nke.~agamst~-xxx
- ,, - • - . . -:. . .~...~.,~,~..',~:~.~,~..~.~
.................... .. :, .~. . v. .a. .t .u. ~. w u t -~ . . . . . . . . . , . ..... "' o. •. . . . . . ...
mmrillb, c.amp ' " ' . ,: : : " - : .. - . , . ~.' - :.. ~"-~ - - , ' : -' ~ . J >.- : - ' ~'llh~ - r e s u l ~ s . ' : 6 f . . s y ~ m evM,!.g{aoft On~ .ttie ~-tgNs d a t ~ : {({ore
•. . . . . -. "., - " - , ' ; ". :. : - , ~-'."smtmg-0f 40 m e s s a g e s l i i 1 s e ~ t e h e e ' s ~ ' a r e g h o W n ih":31abte"L.
(4) K i r o v l o c k e d o n w{.th ,fire c0i, t r 0 l radm" and:~fi, e d : t ' m ' p e d d . . : ~l{e firs:t ]~oiii{ g o n 6 e e is-~ta-M;•d~d 'sysedt~i.;,Xrsed 34: 8 % ' 0 ( I h e
m spetmei~on.. --.. . . . . . :. . . . . . . .. ..... - . " . t e s t ' s e n t e n e 6 8 ' . w h ~ e h . ; h a v e . n o .ur/kfiown' ~vordg,: f~his' figm:e is
• :~ : -:: ', " .-:"::, ;.: (.:--."} -"; ,, .' . : i . ; - . " : ' . 7 ; . s o m e w h a t ] o w d r ],f~kn, t h e C o r r e s p 6 n d i t i g f i g u r e :for.data~ s e t
'(5)/' ..... ..~ _-.. ')" ; : . : < . :... : ' i : .> ".. :. ~The: I.,-:A* ( 5 0 . 6 % ) be.~iitise,'~*hdteSt get-is,h'ard~..Y~tm~/'daf, a i sd.t A * i
d e l i b e r a t e - h a r a s s m e n t ' o f u s c g c s p e n c e r bY tiost'ite k i m v • eric :: ?@he: {esi: s e n t e n c e s g ~ / ~ a o m i ~ l ~ { e b ' n e ~ @lieFa~' :A.*-a~,;ka,.~,:oa
da[ngefs £ n a t r e a d y : f t a g i t e . 1 5 6 1 i { i e a l / m i l i t a r y . b a l m ~ e e • b e t w e e n ' w e r e f a b r i e g t e d by' s t n d y i : n g ' A . s e n t e t / e e s . ) T t m S e c ; n d p o i n t
"/:/,,;ii;~"~:~,a ¢..ie]~~Di,.~a;i-~:': --. . . . . , . • 7 -,..-. = 7 : . : -- ' t o n o t e - i s t h a t . s 2 s t e m - f a i t m - e i ~ d ; u e i n - l g r e a r t t o t h e res-
: ...':'..: :" ".' =" ,7, ,i". %.'- :"-::: " ",.':-",'. " ";::/-" ,/.-.-. '.: ~-;:"7':~ .:..i. :' :enC~e:.~zurtkrioWn:w~r(ls,..ln-.:taet,~aS"T~bl~..g-~i~diaates,abmlt
708
T a b h ; 6: D a t a S e t A* E v a h m t i o n R e s u l t s T a b l e 8: T a g g e r E v a l u a t i o n o n T E S T D a t a
[~FYv~-all Accuracy ~ 2 T n ~ W ~ . d - ]
/ I A~:euracy /
[#}ff CVo~t~'~ted Sentences "~---~-~-~~_71~7g-(91.0%)
~ - ] Before ~IS'aining ~ j ~ 2 8 7 - ~ 8 7 4 ° ~ ~ 8 ~ j % F ~
After Training I | ]249/1287 (97%) ~ 7 0 / 8 2 ~
TrMning II ~ . ~ 1/~_87~98°/~ ~ 1 ~ 2 - ~
T a b l e 7: T e s t E w d u a t i o n R e s u l t s
to part-of-speech prior to a second attenlpt to parse. The
sanle gralllnlar would be used in all cases.
of Sentences with No Unknown Words '' For the solution sketched al)ove, we have evMuated the
of Parsed Sentences ~ ~ ~ Rule-Based Part-of-Speech Tagger (Bri[1, 1992) on the test
of Correctly Translated Sentences data both before and after training on the MUC-II database.
These results are given in Table 8. Tagging statistics q)efore
training' are based on the lexicon an(t rules acquire(l fl'om
tile BROWN CORPUS and the WAI,L S T R E E T JOUR-
NAI~ CORPUS. Tagging statistics 'after training' are divided
translate, it takes an average of 2.28 seconds to translate a into two categories, both of which are based ou the rules ac-
sentence containing 16 words. For a fairly complex sentence quired from training on data sets A, B~ and C of the MUC-
containing 38 words, it takes about 4: seconds to translate. II database. The only difference between the two is that in
Some examples of Korean translation output are given in one case (After Training I) we use a lexicon acquired fi'om
Figure 5. The system runs on a SPARC 10 workstation, and the MUC-II database, and in the other case (After ~[¥ain-
the som:ee ('ode is written in C. Korean translation outputs
are displayed on a hangul window running on UNIX. ing II) we use a lexicon acquired ffoin a combination of the
BROWN CORPUS, the WALL S T R E E T JOURNAL COR-
PUS, and the MUC-11 database. Since the tagging result is
5 Toward Robust Translation quite promising, despite the fact that |;lie training data is of
At the moment, our s~cstem is not capable of dealing with a modest size, we are planning to integrate the tagger into the
sentence containing (i) in]known words, cf. Section 4.2, and analysis module.
(~fi) unkuown constructions, of. Section 4.1. In this section we
iscnss our or>going efforts to overcome these deficiencies: 5.2 Integration of Word-for-Word
Integration of a part-of-speech tagger to handle unknown Translator
words/constructions, mM a word-for-word translator to cope Even though implementii,g tim part-el:speech tagger and ex-
with other system tSihn'es, of. (Frederking and Nirenburg, tending the analysis grantiltar to accept parts-of-speech as
1995). terminal strings will increase tile granmmr coverage, it; is an
ahnost impossible task to write a grammar which covers all
5.1 Integration of Part-of-Speech Tagger freely occurring natural language texts, let alone haw; a re.
Regarding the unknown word problem, an obvious solution bust parser to (teal with this inadequacy, t° Despite this dif-
is to expand the lexicon. Concerning the problem involving ficulty in designing a complete translation system, an ideal
mtknown constructions, we could easily generalize the gram- translation system onght to be able to produce translations
ram: to extend its coverage. However, both of these sohltions which are useflfl under auy circuinstances. Therefore, we
are t)roblematic. Ilandling the unknown word problem by are integrating a word-for-word translator I ~, which provides
increasing the size of the lexicon is not that straightforward tools to akl a human translator, as a fallback system.
given that most unknown words are open class items such as Figure 6 shows the planned robust system architectur%
nouns, verbs, adjectives and adverbs. In addition, one can with the part-of-speech tagger and the word-fl)r-word trans-
not generMize the grammar without side effects. Due to the lator integrated into the core understanding/generation sys-
highly telegraphic nature of the MUC-II data, generMizing tein. Note that the system will provide an indication or flag
tile grammar will increase the ambiguity of an input sentence to the user showing whether the translation is produced by
greatly, of. (Grishman, 1989). :) tIence, we need alternative TINA/GENESIS or by the word-tbr-word fallback system.
solutions to deal with unknown words and unknown con-
structions. Tile most desirM3le solution is to (i) leave the 6 Summary
current grammar intact since it eifieiently parses even highly
telegral)hic messages, and (ii) tackle unknown words and un- In this paper we have described our ongoing work in au-
known constructions by the same mechanism. tomatic English-to-Korean text translation of telegraphic
A potential solution to the unknown word problem is to: messages. This is a part of our overall effort in text and
Do part of speeeh tagging and replace unknown words with speech translation for limited-domMn multilinguM applica-
their parts-of-speech, and bootstrap the parts-of:speech (in- tions. We have described the system architecture (Section
stead of the actual words) to the analysis grammar. The 2), the source language text (Section 3), and the system eval-
unknown words would be replaced in the sentence string uation results {Section 4). We have also discussed ideas on
with their corresponding part-of-speech tag, and the seman- how to make the system robust, and proposed two specific
tk: grammar woukl be auginented to handle generic adjec- solutions: integration of a part-of-speech tagger and a word-
tives, nouns, verbs, etc., intermixed in the rules at appro- for-word translator (Section 5).
priate positions. The idea would be to include just enough
semantic information to solve the ambiguity probleln, effec- 7 Acknowledgements
tively anchoring on words such as ship-name that have high
semantic relevance within the dommn. We would like to acknowledge the following people for their
This approach might also be effective as a backoff mech- contributions to this project: .Jack Lynch~ Beth Carlson,
anism when the system fails to parse a sentence containing Victor Zue, and SungSim Park. We also would like to
only known words. A set of semantically significant vocabu- thank Beth Sundheim of NRaD, Professor Ralph Grishman
lary items could be tagged as "imnmtable", and all the words
in tile sentence except these anchor words would be converted 1°See (Sleator, 1991) for a design of a robust parser which
handles unknown constructions.
"Recall that we resolve the ambiguity problem by con- U The word-for-word translator is being developed by
straining the grammar with semantic categories. GARJAK under a subcontract.
709
at 1609 z h o s t i l e force s launched ~ossive retort e f f o r t from captured a i r f i e l d \
against c t f 177 u n i t s t r a n s i t i n g touard o n e u t r a l notion
PARAPHRASE: 1609 z h o s t i l e forces launched ~ossive reconnaissance e f f o r t fro~ c \
optured a i r f i e l d ugoinst c t f 177 units t r a n s i t i n g to~ord o n e u t r a l notion
TRANSLATION: 16~1 19~'~ . ~ ' ~ x l ~ ~ ~ ~ol ~ t ~ ~I~%~.~@~ @~d\
@?F~ % ~ o1~o~ ~ ctr 1~ -v-c~l ~ c~. ~%~ ~ ~1~
lost hostile gcft in vicinity
PARAPHRASE: lost h o s t i l e a i r c r a f t in v i c i n i t y
~RA.StArZO.: ~ 1 "~ nt~l~ ~ ~ ~1~?1
t~o uss america hosed s t r i k e escort f dash 14 s u e r e engaged by unknown number '
of h o s t i l e su dash 7 a i r c r a f t near land 9 boy lporen i s l a n d 2 t a r g e t f a c i l i t y r'
poren u h i l e conducting s t r i k e against xxx g u e r i l l a camp
PARAPHRASE: 2 USS America-based s t r i k e escort F-14s ~ere engaged u h i l e c o n d u c t i
ng s t r i k e against xxx g u e r i l l a camp by h o s t i l e Su-7 a i r c r a f t unknoun number o f '
near Land9 Boy ( Island2 t a r g e t f a c i l i t y )
F i g u r e 5: S a m p l e K o r e a n T r a n s l a t i o n O u t p u t
F i g u r e 6: R o b u s t T r a n s l a t i o n S y s t e m
of NYU, Proh~ssor Key-Sun Choi of KAIST, Korea, and Ralph Grishman. 1989. Analyzing Telegraphic Messages.
Willis Kim of MITRE. Beth Sundheim provided us with the Proceedings of the 1989 DARPA Speech and Natural Lan-
MUC-II data as well as technical reports documenting the guage Workshop. Cape Cod. Massachusetts.
data. Dr. Grishman provided us with his grammar, dic- W. Kim and W. ,thee. 1994. Machine ]]:anslation Evalua-
tionary, and semantic models tbr the MUC-II data. These tion. Seoul, Korea: MITRE.
materials helped us understand the linguistic properties of Youngjik Lee, Young-Sum Kim, Jung-Chul Lee, Joon-Hyung
the MUC-II corpus. Dr. Choi provided us with documen-
tation and software for KAIST'S MATES/EK English-to- Ryoo and Joe-Woo Yang. 1995. Korean-Japanese Speech
Korean machine translation system. Kim provided us with Translation System for Hotel Reservation - Korean front
electronic English/Korean dictionaries as well as a report on desk side. European Conference on Speech Communica-
his work. tion and Teehnoloq.y. Madrid.
Stephanie Scneff. 1992. TINA: A Nat,ral Language Sys-
tem for Spoken Language Applications. Computational
References Linguistics, 18(1): 61-88.
Daniel D.K. Sleator and Davy Temperley. 1991. Parsing
Eric Brill. 1992. A Simple Rule-Based Part of Speech Tag- English with a Link Grammar. CMU-CS-91-196.
Beth M. Sundheim. 1989. Navy Tactical Incident Report-
ger. Proceedings of the Third Co@rence on Applied Nat-
ing in a Highly Construined Sublanguage: Examples and
ural Language Processing. Trento, Italy: ACL.
Analysis. Technical Document 1477. Naval Ocean Sys-
Key-Sun Choi, Seungmi Lee, Hiongun Kim, Deok-Bong Kim,
tems Center, San Diego.
Cheoljung Kweon, and Gilchang Kim. 1994. An English- Dinesh Tummala, Steph~%nie Seneff, Douglas Paul, Clifford
to-Korean Machine Translator: MATES/EK. Proceedings Weinstein, and Dennis Yang. 1995. CCLINC: System
of the 15th International Conference on Computational Architecture and Concept. Demonstration of Speech-to-
Linguistics. Kyoto, Japan. Speech Translation for Limited-Domain Multilingual Ap-
Robert Frederking and Sergei Nirenburg. 1995. Three Heads plications. Proceedings of the 1995 ARPA Spoken Lan-
are Better than One. C-STAR II Meeting, Pittsburgh. guage Technology Workshop. Austin~ Texas.
James Glass, Joseph Polifroni and Stephanie Seneff. 1.994. Dermis W. Yang. 1996. Korean Language Generation in an
Multilingual Language Generation Across Multiple Do- Interlingua-based Speech Translation System. Technical
mains. Proceedings of the 1994 International Conference Report 1026. MIT Lincoln Laboratory.
on Spoken Language Processing. Yokohama, Japan.
710