Sie sind auf Seite 1von 16

upporting Communication among Different Languages

The UNL system allows people to communicate with peoples of different languages in their mother tongue. The UNL is a common language to exchange information through computers which can deal with natural languages. The UNL system basically consists of language servers, UNL editors and UNL viewers. A conversion system from native languages into UNL is called "enconverter", and the one that deconverts from UNL into native languages is called "deconverter". Information "enconverted," from any language is exchanged in UNL format via networks. Information represented in UNL is "deconverted" into each native language on the terminal network.

Language Server
Language server consists of deconverter and enconverter. The processes of "enconversion" and "deconversion" are provided by a Language Server which resides in the network of the Internet. The "enconverter" and "deconverter" are responsible for converting a particular language into UNL, and vice versa. The "enconverter" "enconverts" a language into UNL, while the "deconverter" "deconverts" UNL into a native language.

Deconverter
A "deconverter" is a software that automatically deconverts UNL into native languages. It is important to achieve a high quality and correct results. It is also important that the basic architecture of the "deconverter" is widely shared throughout the world, in order to treat all languages with the same quality and precision standards. Technology developed for a language can be applied to otherlanguages as long as the architecture is shared. A "Deconverter", which generates natural language from UNL, plays a core role in the UNL system. It is very significant that "deconverter" is capable of expressing UNL information with very high accuracy. It follows that information, once composed in UNL, can be understood in any language as far as there be a "deconverter" of the language.

Enconverter
An "enconverter" is a software that automatically or interactively enconverts natural languages text into UNL. UNU/IAS developed a software for enconversion called "EnCo" which constitutes an enconverter together with a word dictionary, co-occurrence dictionary and conversion rules for a language. This "EnCo" is a language independent software, then it is applicable for any languages. An "enconverter", as it generates UNL from natural languages, enables people to make UNL documents without any knowledge about UNL. It means that users of the UNL system do not need learn UNL. This makes UNL quite different from Esperanto, for instance.

UNL Editor and Viewer


UNL editor is used to make UNL documents. UNL editor is linked to language server equipped with a "enconverter" and a "deconverter" for a natural language. As the author writes a document, e-mail or any other text, in his/her language, UNL editor "enconverts" it into UNL documents. In this process, UNL expressions are produced automatically or interactively with the author. There are four kinds of UNL editor according to the method of enconversion: 1. 2. 3. 4. full automatic enconversion for natural language texts; full automatic enconversion for controlled or tagged language texts; interactive enconversion for natural language texts; word by word input method.

The correctness of generated UNL is increased from 1) to 4), but the cost for making UNL documents is also increased from 1) to 4). Users can choose the enconversion method according to the purpose of the UNL documents that he/she wants to make. UNL editor also shows the input in a UNL document in the author's native language. It shows how the UNL editor understands the original document. The author can check the correctness of the "enconversion". In this verification, the high accuracy of "deconversion" counts a lot. When it is found that the result is not correct enough, the author can either rewrite the original document or modify UNL interactively according to the guidance that is provided by the editor. Then the author can produce a UNL document as correct as is desired. UNL viewer is used to see UNL document in user's native language. UNL viewer utilize a language server when it deconvert UNL documents into the user's native language. In the Internet communication, the dominant text format, HTML, is capable of holding many links with other documents, enabling the readers to refer to various kinds of related documents. In summary, an electronic document contains various supplementary information in it, which contributes to increase usability. UNL information is to be equally treated in the network. One of the HTML merits is that it allows production of the whole document in plain text. In general, information contained in an electronic document is divided into text and embedded instruction. In HTML, however, even embedded instruction is also described in plain text. This characteristic gives HTML a universal adaptability to any editing system in holding the advantage of hyper-text. Furthermore, in HTML, description format for embedding is open to the public. HTML conventions are still expanding and developing. Conventions to treat UNL information are expected to be regarded as one of extensions in HTML. In order to achieve this universality, it is proposed that the description format for UNL expression is considered as an extension of HTML convention. UNL information can be embedded in HTML document with tags attached at its beginning and end, which specify the UNL information. Extensions of conventions should conform to the existing HTML so that it

enables UNL expressions to be handled like other documents, without damaging the HTML hyper-text structure. In order to conform the HTML conventions, description in UNL will be in plain text.

Outline

The UNL System consists of three major components: language resources, software for processing language resources, and supporting tools for maintaining and operating language processing software or developing language resources. Language resources are divided into language dependent part and language independent part. Knowledge about concepts and relations between concepts of words that is universal to every language is considered language independent and to be stored in the common database UNL Ontology (UNLKB). Language dependent resources like word dictionaries and analysis and generation rules, as well as the software for language processing, are stored in each Language Server (LS). Language Servers are connected and operate through the Internet. Supporting tools for producing language resources such as UNL Expressions are basically to be used in a local computer. Verification of UNL Expressions can be carried out through the Internet or in a local computer. These tools operate with consulting Language Servers through the Internet. Supporting tools for developing and maintaining the UNL Ontology for holding the language independent linguistic knowledge, and the UW dictionary for holding the links between UWs and every language are stored in the server under the management of the UNL Center, they can be accessed through the Internet from everywhere. Such UNL System with all its components enables the UNL functions. It makes it possible to produce UNL Expressions (Documents) from natural languages and provide people with access to the UNL Documents.

Mechanism of Conversion between Languages and UNL Expressions

Figure 1 shows the mechanism how the conversion between a natural language and a UNL Document is carried out in the UNL System. Arrows in solid line show dataflow and arrows in broken line show access. The EnConverter and DeConverter are the core software in the UNL System. The EnConverter converts natural language sentences into UNL Expressions. The Universal Parser (UP) is a specialized version of the EnConverter. It generates UNL Expressions from annotated sentences with referring to the UW dictionary without using grammatical features. All UNL Expressions are verified by the UNL Verifier, and then to be stored in the format of UNL Document. The DeConverter converts UNL Expressions to natural language sentences. Both the EnConverter and DeConverter perform their functions based on a set of grammar rules and a word dictionary of a target language. Whether consulting the UNL Ontology and/or a co-occurrence dictionary in EnConverter or DeConverter is optional.

Figure 1. Conversion mechanism of the UNL System

Figure 2 shows the structure of the UNL System and how it is connected with supporting tools and UNLbased applications. Highlighted parts are the components of the UNL System.

Figure 2. Structure of the UNL System and applications

Each component of figures 1 and 2 is the following.

EnConverter
The EnConverter is a language independent parser, which provides a framework for morphological, syntactic, and semantic analysis synchronously. It would be impossible to solve all the morphological ambiguities if the syntactic or semantic analysis is not performed synchronously. And, it would be impossible to solve every syntactic ambiguity in the absence of semantic analysis. The EnConverter works based on a word dictionary and a set of enconversion rules (grammar rules of enconversion). It analyzes sentences according to the enconversion rules. It can deal with various natural languages by using respective word dictionaries and sets of enconversion rules. The EnConverter works in the following way. An input string of text of a sentence is scanned from left to right. When an input string is scanned, all matched morphemes from the beginning (left) of the string are retrieved from the word dictionary and become the candidate morphemes. These candidate morphemes are sorted according to priority. Word selection is done by applying grammar rules of enconversion to these candidate morphemes. Syntactic and semantic analysis is carried out by applying the rules to already selected words to build up a syntactic tree and a semantic network for the input

sentence. This process continues until all words of the sentence are inputted, and a complete semantic network of the input sentence is made. The output of this whole process is a semantic network expressed in the UNL format. The EnConverter also has the function to consult the UNL Ontology. The UNL Ontology helps to select appropriate UWs for ambiguous words and appropriate relations between UWs. Figure 3 shows the structure of the EnConveter. A indicates analysis windows, and C indicates condition windows of the EnConverter. The EnConverter operates on the node-list through analysis windows. Condition windows are used to check conditions when applying a rule. In the initial stage, only one node of an input sentence exists in the node-list. At the end of enconversion, a syntactic tree together a semantic network is made, and the root node remains in the node-list.

Figure 3. Structure of EnConverter

For details see specifications of the EnConveter at http://www.undl.org/unlsys/ds.html

DeConverter
The DeConverter is a language independent generator, which provides a framework for syntactic and morphological generation synchronously. It can convert UNL Expressions into a variety of natural languages, by using respective word dictionaries and sets of grammar rules of deconversion of the languages. A word dictionary contains the information of words that correspond to UWs included in the input of UNL Expressions and grammatical attributes (features) that describe the behaviors of the words. Deconversion rules (grammar rules of deconversion) describe how to construct a sentence using the information from the input of UNL Expressions and defined in a word dictionary. The DeConverter converts UNL Expressions into sentences of a target language following the descriptions of Deconversion rules. Co-occurrence relation-based word selection for natural collocation can also be carried out synchronously. For this purpose, a co-occurrence dictionary of the target language is necessary. The UNL Ontology is also helpful when no correspondent word for a particular UW exist in a language. In this case, the DeConverter consults to the UNL Ontology to try to find a more general (upper) UW of which a correspondent word exists in its word dictionary and use the word of the upper UW to generate the target sentence instead. The DeConverter works in the following way. It first transforms the input of a UNL expression a set of binary relations - into a directed graph structure with hyper-nodes called node-net. The root node of a node-net is called entry node and represents the head (e.g. the main verb) of a sentence. Deconversion of a UNL Expression is carried out by applying Deconversion Rules to the nodes of node-net. It starts from the entry node, to find an appropriate word for each node and generate a word sequence (a list of words in grammatical order) of a target language. In this process, the syntactic structure is determined by applying syntactic rules, and morphemes are similarly generated by applying morphological rules. The deconversion process ends when all words for all nodes are found and a word sequence of target sentence is completed. Figure 4 shows the structure of the DeConveter. G indicates generation windows, and C indicates condition windows of the DeConverter. The DeConverter operates on the node-list through generation windows. Condition windows are used to check conditions when applying a rule. In the initial stage, in opposite to the EnConveter, the entry node of a UNL Expression exists in the node-list. At the end of deconversion, the node-list is the list of all morphemes, with each as a node, that are converted from the node-net and constitute the target sentence.

Figure 4. Structure of DeConverter For details see specifications of the DeConveter at http://www.undl.org/unlsys/ds.html

Dictionary Builder
The Dictionary Builder (abbreviated as DicBld) is a tool used to convert text data of dictionary entries into IBAM (Index Based Access Method) formatted dictionary files. IBAM is invented for quick search of data. In the UNL System, all dictionary data like word dictionary, UNL Ontology, co-occurrence dictionary, KCIC, etc. are stored in IBAM format. Data format of input of DicBld are explained at Word Dictionary below. Figure 5 shows the structure of DicBld.

Figure 5. Structure of DicBld

Word Dictionary
Word dictionaries are prepared for respective languages. All information about words of a natural language is stored in a word dictionary. An entry of the word dictionary contains three parts basically: a headword, a UW and a set of grammatical attributes (features). A headword is a word or a morpheme of a natural language. A sequence of such words or morphemes composes a sentence. As a result, a word or morpheme of a word dictionary is used as a trigger to obtain an appropriate UW in order to create the UNL Expression from an input sentence in the enconversion process, and forms the target sentence of a natural language from a UNL Expression in the deconversion process. The UW of an entry expresses the meaning of its headword. It appears in the UNL Expression of the result of the enconverson process, and is used as a trigger to obtain an appropriate word or a morpheme in order to generate a target sentence of a natural language from UNL Expression in the deconversion process. Grammatical attributes define how a word or a morpheme behaves in a sentence. They are used in both the enconversion and the deconversion rules. Data format of a word dictionary entry is the following. An entry must end with a semicolon. <FLG,FRE,PRI> can be omitted. [HW]{ID} UW (ATTR,ATTR,) <FLG,FRE,PRI>; HW headword of a language

ID

identifier, can be empty

UW Universal Word, can be empty if not necessary ATTRgrammar code FLG language flag, one character in ASCII code FRE frequency to be used in EnCo PRI priority to be used in DeCo

Examples of entries of English dictionary: [a]{} "" (ART,IART) <E,1,1>; [book]{} "book(icl>document)" (BA,C,N,PLS) <E,1,1>; [buy]{} "buy(icl>purchase(agt>thing,obj>thing))" (3SGS,AGT.S,BA,INGING,IRG,OBJ.DO,V,VDO,VDON) <E,1,1>; [bought]{} "buy(icl>purchase(agt>thing,obj>thing))" (AGT.S,ED,EN,IRG,OBJ.DO,V,VDO,VDON) <E,1,1>; [I]{} "I(icl>person)" (1SG,HPRON,PRON,SUBJ) <E,1,1>; [me]{} "I(icl>person)" (1SG,HPRON,OBJ,PRON) <E,1,1>;

Grammar Rule
The DeConverter works based on a set of deconversion rules. Likewise, the EnConverter works based on a set of enconversion rules. Both deconversion and enconversion rules must be prepared for each language. Each set of these rules controls the process of enconversion or deconversion. The ability of a rule is designed to be able to describe on what condition to perform what operation using the grammatical features both or either defined by the rules and/or given in a Word Dictionary. With this, a set of enconversion or deconversion rules can be prepared for a desired language thus allowing the EnConverter or the DeConverter to deal with the language. Rule applications can be controlled by priorities and allow backtracking. When the EnConverter or the DeConverter encounters an ungrammatical or illogical situation, the rules can force the process to backtrack. Backtracking returns back to the previous state that allows selecting a different word or morpheme, or applying the next priority rule. In the previous state, the next candidate is selected or the next priority rule is applied, and conversion process proceeds. What operation to perform in enconversion or deconversion is given by the type of a rule in combination with the sort of the rule. There are three sorts of rules as follows.

Rewriting rule <TYPE> (<PRE>) {<LNODE>} {<RNODE>} (<SUF>) P<PRI>; Left-insertion rule <TYPE> (<PRE>) <LNODE> (<MID>) {<RNODE>} (<SUF>) P<PRI>; Right-insertion rule <TYPE> (<PRE>) {<LNODE>} (<MID>) <RNODE> (<SUF>) P<PRI>; <TYPE> <LNODE> <RNODE> <PRE> <MID> <SUF> <COND> type of rule ::= <COND>:<ACTION>:<RELATION>:<ROLE>

::= <COND>, can be omitted

conditions for applying the rule

<ACTION> actions (changes of conditions) after applied the rule <RELATION>semantic relation between <LNODE> and <RNODE> <ROLE> <PRI> co-occurrence relation between <LNODE> and <RNODE> priority of the rule

For details see specifications of DeConverter and EnConverter at http://www.undl.org/unlsys/ds.html

UNL Document
For information on UNL Document see UNL Document under UNL Expression of UNL.

UNL KCIC
UNLKCIC are information on Key Concept in Context (KCIC) of UNL Expressions. The UNLKCIC is a collection of such information made for every binary relation of UNL Expressions. Such UNLKCIC is used in searching UNL Documents for information. Through the UNLKCC, every UW of the UNL Ontology is linked with the UNL Documents where the UW is included. Consequently, all UWs included in a UNL Documents are linked with corresponding UNL Documents through the UNL Ontology. For realizing this

inter-linkage of UWs crossing UNL Documents, every UW must be registered in the UNL Ontology.

UW Dictionary
UWs and correspondent words of natural languages are stored in the UW Dictionary. Such UW Dictionary can be used as a multilingual dictionary, both for people and for computers, with all synonyms and equivalent words of different natural languages linked together through UWs. For example, in the UW Gate and the UNL Explorer, UNL Ontology can be shown in and searched by natural languages, etc. Where, the UW Dictionary is used.

UW Gate
The UW Gate provides people with the means to access the UNL Ontology and the UW Dictionary through the Internet. Using the UW Gate, people can search for desired UWs, relations between UWs, equivalent words of desired natural languages, etc. Authorized persons can also define new UWs or register new equivalent words of natural languages. New UWs are to be put in appropriate positions on the UW System by following the guidance of the UW Gate, so that they can make the functions of the UNL Ontology work well. New equivalent words of natural languages are registered to link to existing UWs. The UW Gate can be used as an online multi-lingual dictionary. The main characteristic of the UW Gate is semantic co-occurrence relation search. It can be made in three ways of the UNL Ontology: to ask whether a relation between two UWs is true or not, to ask what UWs can have a relation with another UW, or to ask what relations are possible between two UWs. Every search is carried out by inference using the property inheritance mechanism of the UW System. The UW Gate can be used at http://www.undl.org/uwgate/.

Universal Parser
The Universal Parser generates UNL Expressions from sentences without using language dependent grammatical information but only language independent annotations. Sentences of input of the Universal Parser must be annotated with UNL Annotation. The Universal Parser analyzes the annotated input sentences using Universal Parser Rules and a UW Dictionary. The Universal Parser Rules describe operations for creating UNL Expressions only using the information of tags inserted in input sentences. The UW Dictionary provides information of UWs linked with words of input sentences. The Universal Parser analyzes the input sentences according to the descriptions of the Universal Parser Rules, and generates UNL Expressions using UWs linked with words of input sentences.

This mechanism makes it possible that the Universal Parser can deal with any language. In this sense, the Universal Parser is universal. For using the truly Universal Parser, either including every form of a word in the UW Dictionary, or changing all inflected forms of words of input sentences into base forms if the UW Dictionary only contain base forms is necessary. Instead, by simply extending the Universal Parser Rules to include a set of morphological analysis rules of a language, a morphologically customized annotation-based Parser of a language can be easily made.

Figure 6. Structure of Universal Parser For details see http://www.undl.org/unlsys/uparser/UP.htm (1.0, 2003). The UP can be used at www.undl.org/up/ For more information about the UP please contact the UNL Center at unlcenter@undl.org.

UNL Verifier
The UNL Verifier verifies whether a UNL Expression is correct syntactically, lexically and semantically. The syntax check of a UNL expression is carried out following the UNL Specifications. In the lexical check, whether all UWs of a UNL Expression are defined in the UNL Ontology is checked. In the semantic check, whether each binary relation of a UNL Expression is defined as possible is certified with consulting the UNL Ontology. Figure 7 shows flowchart of the UNL Verifier and how dictionaries are used.

Figure 7. Flowchart of UNL Verifier

Language Server
UNL Language Servers (LSs) are located on the Internet to carry out the conversion processes between natural languages and UNL Expressions. A Language Server contains an EnConverter and a DeConverter of a language. UNL Language Servers start conversions when receiving requests from any applications including web applications, and provide the results when the conversions are completed. Figure 8 shows an example of how Language Servers of the UNL System work through the Internet.

Figure 8. How Language Servers work through the Internet

Das könnte Ihnen auch gefallen