Sie sind auf Seite 1von 29

Bilingual Dictionaries Past, Present and Future

B.T.S. Atkins

Abstract

The past is print dictionaries; the present is print dictionaries with some electronic versions of the same text; the future must be print dictionaries and truly electronic dictionaries, compiled afresh for the new medium, enriched with new types of information the better to meet the needs of the multifarious users. The paper sets out the various aspects of the bilingual dictionary which must be taken into account if the new dictionaries are to be different from (and better than) the old. A design for .a new electronic bilingual dictionary is sketched out, applying a frame semantics approach to corpus analysis. A demonstration of the prototype multilingual hypertext Dictionary of the Future will be given.
1. Looking at today's dictionaries

C h a n g e is n o t something that p e o p l e tend to a s s o c i a t e w i t h d i c t i o n a r i e s . Changing t h e s e highly labour-intensive products is not to be undertaken lightly. (Here l am talking about large-scale, radical change, not simply updatings and corrections.) T h e heavy cost of dictionary production, and the penalty to be paid for errors of judgement, have m a d e it almost impossible for any radically n e w dictionary to come into being. Of course, our dictionaries of the present do look a little different from their p r e d e c e s s o r s , and do b e h a v e a little better (it is b e c o m i n g rarer now to find dictionaries with hermetically sealed n u g g e t s of information coded up to defy interpretation by all but the dogged few); they m a y even come to you on a C D - R O M rather than in book form, but underneath these superficial modernizations lurks the s a m e old dictionary. S o m e of the m o r e innovative m a y introduce a few new types of information (corpus frequencies are the flavour of the month), but when it c o m e s to setting out the m e a n i n g s of w o r d s , g i v i n g them definitions or equivalents in another l a n g u a g e , i n c l u d i n g examples, idioms, pronunciations, usage notes, cross-references and the score or so of other kinds of information, tradition rules s u p r e m e . M o s t dictionaries are sublimely unaffected b y the highly relevant w o r k c u r r e n t l y b e i n g d o n e b y linguists, especially in lexical semantics. T h e dictionary of the present is at heart little different from the dictionary of the past. Will the dictionary of the future

B.T.S. Atkins

simply blip its little electronic way off into the sunset dazzling its readers with the speed w h i c h it dishes up the same old facts on a technicolor screen? It is up to us to take u p the real challenge of the c o m p u t e r age, by asking not how the c o m p u t e r can h e l p us to p r o d u c e old-style dictionaries better, but h o w it can h e l p us to c r e a t e s o m e t h i n g new: to look at the needs of dictionary users of every l a n g u a g e , and every walk of life, users as diverse as p e o p l e themselves, and give t h e m the kind of information they need for whatever they are using the dictionary for, and not simply the popular selection of facts that will pack semiIegibly inside b o o k c o v e r s . I respect and admire the achievements of our great predecessors. But if they w e r e here today, I put it to you that they would not be simply r e p r o d u c i n g the achievements of their elders, or revising the great works of the past: they would be rooting for a n e w kind of dictionary, one in which the c o m p u t e r plays its rightful, creative role. O u r particular present is a good time for taking stock: w e h a v e behind us a long tradition of d i c t i o n a r y - m a k i n g , a rich heritage of reference works to study and analyse; w e the lexicographers are ourselves dictionary users and know the frustrations; in electronic corpora, n o w fairly freely available, w e have a wealth of lexical e v i d e n c e u n d r e a m t of in the past; we have friends and colleagues in a c a d e m i a w h o s e w o r k w e can learn from, and use in our o w n ; n e w research ( m u c h of it b y E U R A L E X m e m b e r s ) is telling us about the way in which people u s e dictionaries, and what they use t h e m for; and n o w at last we are liberated from the straitjacket of the printed p a g e and alphabetical order. If we are to e x p l o i t t h e s e p r o p i t i o u s c i r c u m s t a n c e s , if w e are to create a new kind of dictionary, there are a few questions to be answered: first, questions about what o u r c u r r e n t d i c t i o n a r i e s are, and w h y they are like that, and if they can be improved; then, questions about the new dictionary, w h o it is for, what they will w a n t from it, a n d h o w w e can p r o v i d e that. In this paper I will be looking particularly at dictionaries for bilingual use, but (for reasons which I hope will b e c o m e clear) I d o not w a n t to limit the discussion to 'bilingual dictionaries' as such.
1.1. The organization of current bilingual dictionaries

A systematic a p p r o a c h to the study of w h a t a bilingual dictionary does and how it does it must take account of the following aspects of the entry': function of that information (what the user can use it for); m o d e of expression (how it is expressed); t y p e of user: S o u r c e L a n g u a g e ( S L ) s p e a k e r or T a r g e t L a n g u a g e (TL) speaker;

Bilingual Dictionaries - Past, Present and Future

purpose of use (encoding into a foreign language, or decoding into one's o w n language). It is easy to confuse type of data with type of information, and care must be

t a k e n to d i s t i n g u i s h

these t w o c o n c e p t s . M o r e o v e r , they m u s t b o t h be

differentiated from t h e f u n c t i o n of the piece of information, that is, what the user can do with it. T h e material in Tables 1 and 2, together with the following example, will clarify t h e distinctions. A francophone w i s h e s to k n o w h o w to translate une couche English and looks up couche extract from the Oxford-Hachette French-English Dictionary d'argile into in a French-English dictionary. Figure 1 shows an (1994) entry from

which the example phrases and other items have been omitted:
I nfl (de vernis, peinture, d'apprt) coat; (d'aliments, de poussire, neige) layer;

2 (strate ) stratum, layer. 3 Sociol sector. 4 (pour bbs) nappy, diaper. Figure 1. Abridged entry for couche The underlined segments (our underlining) of the entry d o not all constitute the same data types, n o r d o they carry the same types of information, but they all have the same f u n c t i o n , that of helping the francophone ( S L ) user to select the correct equivalent of the headword for the context it is to b e used in d'argile fourth (( de vernis...), item (Sociol) (sociology). (strate) and (pour bbs))
2

(couche

is layer of clay). The mode of expression of the first t w o items and the is the SL, while that of the third and T L is a c o d e , in this case c o m m o n to both S L (sociologie) The situation is summarized in Table 1: Data Type complementing sense indicator substituting sense indicator diatechnical label . complementing sense indicator Information Type Function SL collocates of pinpoints relevant sense of couche couche synonym of couche semantic domain real-world fact pinpoints relevant sense of couche pinpoints relevant sense of couche pinpoints relevant sense of couche

Item
(de vernis peinture d'apprt) (strate) Sociol (pour bbs)

Table 1. Types of data, information and functions

B.TS. Atkins
Data type 1 2 3 4 lemma forms phonetic transcription grammar form sense or /subsense + counter grammar usage item TL equivalent gloss Typical example + translation
5

Mode SL code I PA code alph /num code SL + TL TL TL SL + TL SL + TL SL + TL code

Information Content lexical form(s) of the HW'/subheadword how the H W is pronounced part of speech, gender, etc. of H W this is a distinct sense or subsense of the HW grammatical complementa tion of HW in this sense & its translation this is TL equivalent of H W in this sense an explanation of HW in this sense this is how the HW in this sense is typically used & translated the H W in this context has a specific TL equivalent the H W and context have this specific TL equivalent H W in this sense belongs to this semantic domain of (Music, Science etc.) using the SL or TL item in this sense is in (literary etc.) style using the SL or TL item in this sense is in (informal etc.) register the S L or TL item in this sense belongs to X regional variety of the language the S L or TL item in this sense is (obsolete / oldfashioned etc.) using the SL or TL item in this sense is (pejorative etc.) synonym or paraphrase of H W in this sense / other brief sense clue typical subjects / objects of H W verbs, nouns modified by H W adjectives etc. typical subjects / objects of T L equivalent verbs, nouns modified by TL equivalent adjectives etc. this other definiendum is relevant to the H W in this sense

Function helps user find the information being sought helps the non-native speaker pronounce the word correctly helps user find the information being sought helps user find the information being sought

User e n c SL d e c TL enc TL
3 2

enc dec enc dec

SL TL SL TL
4

helps TL user use SL item correctly helps SL user identify the sense of the HW helps TL user understand helps both users translate helps TL user understand helps both users translate helps SL user identify the sense of the HW reassures SL user trying to translate SL item helps TL user use SL item correctly helps SL user identify the sense of the HW helps SL user avoid translating error helps TL user understand helps both users translate helps both users select correct TL equivalent helps SL user identify the sense of the HW helps helps helps helps helps helps both users translate TL user understand SL user identify the sense of the HW both users translate TL user understand SL user identify the sense of the HW

enc T L enc SL dec enc dec enc enc enc TL SL TL SL SL TL

6 7 8

problematic example + translation


6

enc SL

10

idiomatic example + translation


7

dec TL enc SL dec TL enc SL enc SL dec TL enc SL dec TL enc SL dec TL enc SL dec TL enc SL dec TL enc SL

11

diatechnical label stylistic label

12

code

13

register label

code

14

diatopic label

code

helps both users translate helps TL user understand helps SL user identify the sense of the HW helps helps helps helps helps helps both users translate TL user understand SL user identify the sense of the HW both users translate TL user understand SL user identify the sense of the HW

15

diachronic label evaluative label sense indicator

code

16

code

17

SL

helps SL user identify the sense of the HW

18

collocators

SL

helps both users translate helps SL user identify the sense of the HW helps both users translate

enc SL dec TL enc SL dec TL

19

collocators

TL

20

cross-reference

SL

helps users find the information being sought

enc SL dec TL

Table 2. The organisation of a bilingual dictionary entry Notes on the contents of Table 2 1 headword

Bilingual Dictionaries - Past, Present and Future

2 encoding (translating into or writing in the foreign language) 3 . decoding (understanding or translating from the foreign language) 4 a TL speaker who stores the information for later use in encoding 5 an SL example sentence in which the headword and context are amenable to virtually a word-to-word translation into the TL 6 an SL example sentence which is easily undestandable for the TL speaker but presents translation problems for the SL speaker 7 a multiword expression (MWE) in which the headword figures, or an example containing such an MWE; the meaning of the MWE is idiomatic, and thus the SL item is semantical ly opaque to the TL user and not amenable to straightforward translation by the SL user. T a b l e 2 g i v e s an overview of the organization of the traditional bilingual dictionary e n t r y
3

(see the n e x t p a g e s ) . T h e p l a n n i n g and design of

future

bilingual dictionaries must take account of all of these factors.


1.2. Evaluation

The information in Table 2 allows us to evaluate an imaginary best e x a m p l e of our current bilingual d i c t i o n a r i e s . If w e are to d e s i g n the d i c t i o n a r y
4

of

tomorrow, w e need to be able to build on the good and improve the less good aspects of today's_.dictionaries. L o o k i n g at the v a r i o u s aspects of bilingual dictionaries set out in T a b l e 2, we must consider w h a t is good and "must b e retained, and what is less good, and must be improved.
1.2.1. Strengths

In the best of today's bilingual dictionaries, as T a b l e 2 shows, there are m a n y things to praise. I shall list these briefly:
(a) Wealth of information
5

S e m a n t i c s : lexical i t e m s are carefully a n a l y s e d and explained, and their various T L equivalents are set out clearly and helpfully. G r a m m a r : there is a c o m m i t m e n t to include e n o u g h information (albeit often couched in opaque codes), to allow the foreign l a n g u a g e expres-sions to be used correctly.

Collocation: this type of information is often d r a w n from corpora, and the the tendency n o w is towards including this wherever possible. Peripheral linguistic information, regarding style, register, region, cur-rency, semantic domain and so on: dictionaries are very rich in this. Pragmatics: this type of information often appears in the form of usage notes, or of extra-textual information in the front or back matter.

B.T.S.Atkins

Up-to-date l a n g u a g e : this is a priority for most publishers, and the tendency is more and m o r e for corpora to supplement editors' card-index files.

(b) Excellent scholarly

work

L e x i c o g r a p h i c a l : the p l a n n i n g , design and i m p l e m e n t a t i o n of today's top bilingual dictionaries are often excellent, and the editors of n e w dictionaries on the market are hard put to it to devise anything better in the same size and price range as their competitors.

Linguistic: the summary

list (in S e c t i o n (a) a b o v e ) of the t y p e s

of

i n f o r m a t i o n p a i n s t a k i n g l y g a t h e r e d , o r d e r e d , c o m p r e s s e d and p r e s e n t e d intelligibly gives e n o u g h evidence of this.


(c) User's needs are paramount

T h e lexicographers had a clear idea of the competence, objectives and needs of the users they w e r e writing for, and this is evident from the content and presentation of the dictionary.

T h e explanatory material is rich and well thought out, and the metalanguage is tailored to the user w h o needs the information. T h e front and b a c k matter, also, is well p l a n n e d and i n f o r m a t i v e , often including verb tables, other tabular information, and annotated sample pages to help the user to get the most out of the work.

T o d a y ' s bilingual d i c t i o n a r i e s are a pleasure to use: the b o o k s are clearly printed and attractively b o u n d , and the text carefully designed to best serve the purpose of the publication.

Finally, today's dictionaries are excellent value for money. F e w other books contain so m u c h information p e r s q u a r e c e n t i m e t e r , or e n t e r t a i n the discerning reader so well.

1.2.2.

Weaknesses

W e take a c o n s t r u c t i v e a p p r o a c h to the task of identifying w e a k n e s s e s in the bilingual dictionaries of t o d a y : it is from these flaws (often i m p o s e d by the limited technology of o u r i m m e d i a t e past) that w e may draw our inspiration for the dictionary of t o m o r r o w .
(a) Redundancies

A s T a b l e 2 s h o w s , e v e r y entry is too rich for a n y o n e reader. It is layered with pieces of information w h i c h the reader d o e s not need (what is actually redundant for any individual r e a d e r d e p e n d s of c o u r s e on the particular c i r c u m s t a n c e s ) ;

Bilingual Dictionaries - Past, Present and Future

this m a k e s the d i c t i o n a r y harder to use. R e s e a r c h

has s h o w n that m a n y

dictionary users, particularly the less motivated, give up before finding the information they need, even when that information is reasonably prominent in the entry. T h e ideal dictionary should be tailored, or at least tailorable, to one particular type of user.
(b) Gaps in
;

coverage

Ironically, in view of these redundancies, no current dictionary, h o w e v e r large, can hope for anything like comprehensive coverage, even if its scope is limited by date or regional variety. Space considerations are not the only reason for this i n a d e q u a c y : certain linguistic p h e n o m e n a m a k e it i m p o s s i b l e for a static dictionary (such as a print dictionary, or the s a m e on C D - R O M ) to predict semantic or lexical variants which may o c c u r as single w o r d s or multi-word expressions ( M W E s ) . A list of such p h e n o m e n a would b e long, but w o u l d certainly include the following (shown here with brief e x a m p l e s of each, taken from the Oxford lexicographical corpora ): n e o l o g i s m s e.g. (from the dozen or so e x a m p l e s in the O U P U S reading programme c o r p u s ) By introducing part of the anatomy 'the bobbitting systematic is bobbitted.
8 7

a certain gene a spare may be grown if a The Washington Times praises Noriega'.
9

... Bush

for

of both Saddam Hussein and Manuel

p o l y s e m y (a m u c h d i s c u s s e d t o p i c : the f o l l o w i n g it's a hedgehog. That woman nearly had hedgehog

corpus

sentences exemplify the lexical implication rule 'animal -> its meat') e.g. It's not a porcupine, stew. he has nut:
10

variation in M W E s e.g. (chosen from nine attested variants) whether taken a sledgehammer sledgehammer, to crack a nut: accused the use of a sledgehammer in his shoes: unlikely has every reason for the cracking of a smallish shake

of trying to crack a nut with a

creative exploitation of M W E s e.g (chosen from many m o r e v a r i a n t s ) the Dean shook boots: to make any of the teams quake in their Doc in their in my Corman to quake in his boots: I'm quivering: Marten's.

boots at these problems;


(c) Limited user involvement

made the Redskins


in equivalence

selection

It is very hard for a bilingual dictionary user to tell if a w o r d in L a n g u a g e A m e a n s the same as an u n k n o w n w o r d in L a n g u a g e B, far less w h e t h e r they diverge in style, register, collocational potential etc. It could be argued that, like t r u e ' s y n o n y m s in a single l a n g u a g e , such cross-linguistic s y n o n y m s d o not exist". Approximation in m a n y , p r o b a b l y m o s t , of t h e e q u i v a l e n c e s is inevitable. The lexicographer has to m a k e decisions w h i c h rightly should b e

B.T.S.Atkins

m a d e by the d i c t i o n a r y user, w h o is the only person to know exactly what is needed in the other l a n g u a g e . T h e ideal dictionary should offer the skilled user the c h a n c e to m a k e his or her o w n j u d g e m e n t on e q u i v a l e n c e s , by s c a n n i n g e x a m p l e s of the T L i t e m s (grouped according to m e a n i n g ) in various types of context, as well as - for contrastive checking purposes - examples of the relevant m e a n i n g of the SL item in a wide variety of contexts.
(d) Distortion ofSL analysis by needs of TL
12

T h e 'left-hand side' of a bilingual dictionary (the S L items) is never simply the s a m e material as is to b e found in a monolingual dictionary of the same size. T h e SL material is s u b t l y distorted by the T L , in order to m a k e the bilingual dictionary better, a l l o w i n g , for instance, a very brief entry in cases where all or m o s t of the senses of the S L item h a v e the same T L e q u i v a l e n t . Such devices clearly m a k e the d i c t i o n a r y m u c h easier to use, and c o m p a c t i o n of information allows more detail e l s e w h e r e . It does, however, prevent the dedicated user from getting a clear view of the potential of the SL item, which must be sought in a m o n o l i n g u a l work. T h e ideal bilingual dictionary w o u l d b e able to cater for all needs: impossible, of c o u r s e , in a printed work.
(e) Restricted information
13

W e often find when w e are using a dictionary that w e need m o r e information either about a word in o u r o w n l a n g u a g e or more often about an expression in the foreign l a n g u a g e : r e s e a r c h d e s c r i b e d in A t k i n s and Varantola (in press) s h o w s that p e o p l e often turn to a m o n o l i n g u a l d i c t i o n a r y during a bilingual search. T h e ideal d i c t i o n a r y s h o u l d offer m o n o l i n g u a l functions (definitions, etymologies, usage notes) to the b i l i n g u a l dictionary user. It should cater for the dictionary browser, as well as the user intent upon one task.
(f) Lack of collocational options

S p a c e c o n s t r a i n t s m a k e it i m p o s s i b l e for u s e r s to see a w i d e r a n g e of collocational partners of the foreign language word they want to use. T h e ideal dictionary should allow the user to b r o w s e through genuine attested e x a m p l e s of the foreign expression in use in various types of texts.
(g) Restricted metalanguage: abbreviations, codes and symbols

O w i n g again to space c o n s t r a i n t s , m u c h metalinguistic information is expressed in the form of abbreviations ('Naut', ' Archit' etc.), codes ('vt' 'npl, '+to-infin') user, these can be hard to understand. or s y m b o l s (asterisks, d a g g e r s , bullet points etc.). For the less motivated dictionary

Bilingual Dictionaries - Past, Present and Future (h) No formal thesaural functions
14

Lack of s p a c e and c o m m e r c i a l p r e s s u r e s d u r i n g the e d i t i n g

prevent a

systematic s e m a n t i c s - b a s e d a p p r o a c h to c o m p i l i n g , and h e n c e e x c l u d e the possibility of a full thesaurus as an integral part of a dictionary. 'Dictionary and Thesaurus' works usually consist of a small dictionary packaged with a selection of word-based synonymic material.
(i) No multilingual dimension

Multilingual dictionaries tend to be simple listings of equivalences across three or more languages. T h e most useful of these focus on specific semantic domains and technical terms. Again, lack of space and c o m m e r c i a l pressures m a k e a true multilingual dictionary impossible, but, even if these obstacles w e r e r e m o v e d , the bilingual dictionaries of today could not b e transformed into multilingual dictionaries, because of the distortion of the S L analysis by the needs of the T L (discussed above). If a multilingual dictionary is to b e c o m p i l e d , w e h a v e to devise an analysis technique c o m m o n to all the languages involved, and capable of recording without distortion the linguistic p h e n o m e n a o c c u r r i n g in each language.
2. Devising tomorrow's dictionary

A s the evaluation in 1.2 s h o w s , even the best of current bilingual dictionaries suffer from serious deficiencies, but I would argue that lexicographers are now in a position to address almost all of them. M a n y of the obstacles to the creation of tomorrow's improved bilingual dictionary have been removed in the past few d e c a d e s by the advent of the c o m p u t e r (computer-assisted l e x i c o g r a p h y , rich electronic text corpora as sources of lexicographical e v i d e n c e , c o m p u t a t i o n a l s e a r c h e s of d i c t i o n a r i e s , and so on) and a d v a n c e s in linguistic t h e o r y , in particular - in my view at least - the d e v e l o p m e n t of frame s e m a n t i c s
15

as a
6

theoretical tool for multilingual contrastive descriptions. H o w e v e r , the greatest obstacle to the production of the ideal bilingual dictionary is undoubtedly c o s t ' . W h i l e we are now, I believe, in a position to produce a truly multidimensional, multilingual d i c t i o n a r y , the problem of financing such an enterprise is as yet unresolved.
2.1. Users and their needs
17

Every good dictionary starts from a clear idea of w h o its users are and what they are going to do with it. User profiles for bilingual dictionaries m u s t of course include the user's native language. T h e new-style bilingual dictionary must cater

10

B.T.S.Atkins

e q u a l l y well for s p e a k e r s of L a n g u a g e A, and speakers of L a n g u a g e B. All m e t a l a n g u a g e should b e in the user's m o t h e r t o n g u e ( L I ) . This will obviously i n v o l v e reduplication of effort at the compiling stage, but in an online dictionary should not result in redundant information at the point of use. In a d i s c u s s i o n of m u l t i l i n g u a l electronic dictionaries, it is important to d i s t i n g u i s h b e t w e e n the c o n t e n t language and the presentation l a n g u a g e . T h e c o n t e n t l a n g u a g e c o n s t i t u t e s the object of the lexicographical analysis and description: a monolingual database contains facts about one content language; a bilingual English-French dictionary involves t w o content languages, and so on. T h e p r e s e n t a t i o n l a n g u a g e is the l a n g u a g e in w h i c h all metalinguistic information is c o u c h e d , and also other types of information: in a monolingual F r e n c h dictionary (one in which the content l a n g u a g e is exclusively French), if E n g l i s h is selected as the presentation l a n g u a g e the definitions as well as instructions for using the dictionary and the metalinguistic information might well b e expressed in English. An electronic bilingual dictionary is able to offer the user a choice of presentation language, as well as of content language; it is i n d e e d possible to e n v i s a g e a situation w h e r e a J a p a n e s e speaker w i s h i n g to c o m p a r e E n g l i s h a n d F r e n c h c o n s u l t s the b i l i n g u a l E n g l i s h a n d French
18

dictionary in contrast m o d e and elects Japanese as the presentation l a n g u a g e . F u r t h e r m o r e , definitions, explanations and other metalinguistic information m u s t be transparent: a b b r e v i a t i o n s , codes and s y m b o l s should be avoided. T h e familiar 'telegraphese' style of definitions and e x p l a n a t i o n s may b e abandoned. T h e n e w dictionary should be a pleasure to read. It must serve the following types of user activity : understanding L2 (written and spoken) translating L2->L1 translating L1->L2 expressing oneself in L 2 (written and spoken) (all four well known), and in addition: learning more about L 2 learning more about L1-L2 equivalences and contrasts F o r s o m e of the above tasks, s o m e types of data will not be appropriate. For instance, a user trying to read in a foreign l a n g u a g e will want the m i n i m u m of information, in order not to interrupt the concentration of the reading p r o c e s s . O n the other hand, s o m e o n e s t u d y i n g the l a n g u a g e will want m o r e detail, and s o m e o n e with time to spare m a y simply wish to browse. T h e n e w dictionary must give its users the o p p o r t u n i t y to m a k e their o w n decisions about equivalences: they should b e able to consult as m a n y examples
20 19

Bilingual Dictionaries - Past, Present and Future

11

as they need of words used in their various senses, each in a variety of contexts with a variety of collocate partners. T h e y should be able to call up monolingual definitions for these w o r d s , to learn about their s e m a n t i c r e l a t i o n s h i p s (of hyponymy, synonymy, antonymy etc.) with other items in the language and with items in the other language. T h e new bilingual dictionary will provide for its users an accurate reflection of the various meanings of a word, independent of the needs of T L equivalences. Finally, the n e w bilingual dictionary must not o v e r w h e l m its user. T h i s means that the user must have a say in what information the dictionary offers, and h o w it presents it. W h e n , as will now be proposed, the dictionary is held in hypertext, it also means that serious thought must be given to making sure users can orient themselves effectively: it is easy to get lost in hypertext.
2.2. Exploiting new computational resources

The new-age bilingual dictionary must exploit the advantages of the electronic m e d i u m , of which the following are the principal (the letters in brackets below indicate the weakness or weaknesses, set out in 1.2.2, which the particular item addresses): hypertext functionality eliminating linear text restrictions and o p e n i n g the way to new types of information by offering new ways of presenting it (a, b , c, d, f, h, i); n o space constraints other than the need to avoid s w a m p i n g the user (e, f, g, h,

0;

n o distortion of the source language description by the needs of the target language (d); flexible compiling liberated from alphabetical order (h); alternative ways of presenting information, as for e x a m p l e graphics (e); rapid access to large amounts of lexicographical evidence in corpora (b, c, f); large-scale user customization (a, c). Various c o n s e q u e n c e s for the n e w - s t y l e d i c t i o n a r y d e s i g n are d i s c u s s e d

b e l o w . Today's C D - R O M dictionaries, being little m o r e than reincarnation of p r i n t dictionaries, d o not exploit any of these o p p o r t u n i t i e s . Computerized functions and p r o c e s s e s currently realized or r e a l i z a b l e , such as a c c e s s i n g virtually u n l i m i t e d c o r p u s m a t e r i a l , or the c o m p u t e r i z a t i o n of c o m p i l i n g , typesetting and so on, are omitted from this discussion.

12 2.2.1.

B.T.S. Atkins Real databases, real links and virtual dictionaries

O n e of the priorities of the new bilingual dictionary is to avoid the distortion to the source l a n g u a g e analysis (noted in 1.2.2 d) by the pull of the target language equivalences to be offered in the entry: the more sense overlap there is in the SL and T L lexical e q u i v a l e n t s , then the greater the distortion. This should not h a p p e n in o u r p r o p o s e d n e w dictionary, which consists of two types of material: (a) the d a t a b a s e s ( c o m p i l e d i n d e p e n d e n t l y for e a c h l a n g u a g e ) and (b) the dictionaries (the h y p e r t e x t links, metalinguistic e x p l a n a t i o n s , and instructions for use w h i c h are created separately for e a c h dictionary). Figure 2 sets out the relationships: the s h a d e d oblongs are dictionaries for h u m a n users; these are created by the four processes (simple extraction, partial translation, comparison and alignment) carried out on the t w o databases (which in our protototype hold analyses of English and of French).

Figure 2. Real databases, processes, and virtual dictionaries

Bilingual Dictionaries - Past, Present and Future

13

(a)

The

databases

A monolingual database is created for each individual l a n g u a g e , c o m p l e t e l y i n d e p e n d e n t l y of any other, e x c e p t that all the m o n o l i n g u a l d a t a b a s e s are compiled within the same theoretical framework (see 3.1) and most of the linguistic facts they hold are inter-compatible, allowing matching of equivalents according to a variety of criteria. This feature enables the production of various types of dictionaries (see (b) below) by adding hypertext links and explanatory material to the monolingual databases. The contents of these databases should as far as possible be formalized, in order to facilitate access by c o m p u t e r s , both those serving information to the dictionaries, and h e n c e to the d i c t i o n a r y users, and those populating automatically lexicons being built for other systems.
(b) The dictionaries

These will be of at least three types: monolingual, bilingual and multilingual, and indeed when enough dictionaries have been compiled the user will b e able to switch dictionary types at will. Each type of dictionary will offer the user various levels of information, from brief and simple to long and c o m p l e x . W e may think of these as Level 1, Level 2 and so on. Monolingual dictionaries may be used in two distinct w a y s : look-up m o d e , where the user is in search of a specific piece of information, and b r o w s i n g mode, w h e r e a more relaxed reading takes place. Dictionary b r o w s i n g is an activity t o be specifically catered for in the dictionary of t o m o r r o w , and the electronic medium offers new w a y s of making this type of dictionary use e v e n more informative and agreeable. Bilingual and multilingual dictionaries may function in at least two different m o d e s : the traditional m o d e of bilingual dictionaries, w h i c h w e term 'equiva lence m o d e ' , and a n e w m o d e d e s i g n e d to satisfy the scholar or b r o w s e r , 'contrast mode'. E q u i v a l e n c e m o d e is intended to help users w h o h a v e to perform specific tasks ( s u c h as translation, c o m p r e h e n s i o n or self-expression; see 3.1); by a process of alignment (see Figure 2) expressions in L a n g u a g e A, which in o u r prototype dictionary is English, and L a n g u a g e B ( F r e n c h ) are aligned o n the basis of one or more specified condition(s), both the traditional ones ( s y n o n y m y or near-synonymy, antonymy, style, register etc.), and the criterion pioneered in D E L I S a n d described in Heid (1994) and Heid and Kruger (1996), n a m e l y the

14

B.T.S. Atkins

m a t c h i n g of frame elements e x p r e s s e d in the context of the words in question. Section 3.1 (5) explains the term frame element. C o n t r a s t m o d e (or b i l i n g u a l b r o w s i n g m o d e ) is intended for the person w h o wishes to find out more about h o w selected items c o m p a r e in two or more l a n g u a g e s . T h i s p r o d u c e s the dictionary of the browser. It is compiled by the p r o c e s s t e r m e d c o m p a r i s o n , and offers w a y s of contrasting the meaning and syntactic b e h a v i o u r of c h o s e n w o r d s across l a n g u a g e s , with both textual and graphical explanations of similarities and differences.
(c) The processes and links

A s F i g u r e 2 s h o w s , in the c r e a t i o n of the v a r i o u s t y p e s of d i c t i o n a r i e s for human u s e r s , four different processes are e n v i s a g e d . The diagram oversimplifies these processes but is useful for the purposes of explanation. All four p r o c e s s e s i n v o l v e the introduction of hypertext links, and, for each link, the c o n c o m i t a n t m e t a l i n g u i s t i c i n f o r m a t i o n and o p e r a t i n g instructions and guidelines. Extraction is the n a m e given to the process of selecting and linking items within one c o n t e n t l a n g u a g e , and so results in a m o n o l i n g u a l dictionary. H e r e extraction s u b s u m e s a certain a m o u n t of c o m p a r i s o n and alignment of items in the s a m e language, since the dictionary w h i c h is created by this process includes functions such as the matching and differentiation of near-synonyms. P a r t i a l T r a n s l a t i o n is t h e n a m e g i v e n to the p r o c e s s of c r e a t i n g a m o n o l i n g u a l d i c t i o n a r y by the e x t r a c t i o n p r o c e s s and b i l i n g u a l i z i n g various sections of it so that the language of presentation is different from the language b e i n g a n a l y s e d a n d described, thus m a k i n g it m o r e accessible to speakers of other languages. Comparison is the n a m e g i v e n to the p r o c e s s of c r e a t i n g a 'contrast dictionary' w h e r e items in two or m o r e l a n g u a g e s are c o m p a r e d along various axes, such as m e a n i n g , syntax, style, register, collocational patterns etc., and particularly the w a y in which the e l e m e n t s of the semantic frame get expressed in the c o n t e x t of the w o r d s in q u e s t i o n . T h e resultant d i c t i o n a r y allows the b r o w s i n g user to d i s c o v e r and evaluate real similarities and differences between the items. A l i g n m e n t is the n a m e given to the process of establishing equivalence links between items in t w o or more l a n g u a g e s . This process involves designating one l a n g u a g e as the 'departure' or 'source' l a n g u a g e (the SL) and one as the 'arrival' or 'target' l a n g u a g e (the T L ) and the resulting 'equivalence' dictionaries are very close in function to the bilingual dictionaries w e know today.

Bilingual Dictionaries - Past, Present and Future

15

T h e term links is intended to cover the hypertext links themselves, together with any linguistic m e t a l a n g u a g e , compiled by the lexicographers in order to structure the information for the user, and any navigation instructions written by the lexicographers in order to help the user get the best out of the dictionary. T h u s , in brief, the proposal is for a multilingual hypertext lexical resource in which the monolingual databases are real; links (including metalanguage and instructions) between database items are real; the dictionaries themselves are virtual.
Customizing
;

2.2.2.

A n o t h e r function to c o m e into its own in the dictionary of t h e future is the ability to customize the dictionary to suit o n e ' s o w n circumstances; at present, d i c t i o n a r i e s on C D - R O M a l l o w a m i n i m a l a m o u n t of c u s t o m i z a t i o n be included in a search. Users will be able to c u s t o m i z e the new d i c t i o n a r y . a c c o r d i n g to their individual needs; in the case of the bilingual dictionary, the customization will bear largely on their k n o w l e d g e of their o w n and the foreign language, and the task they are performing with the help of the dictionary. (Top m u c h information in a bilingual dictionary is as bad as too little.) T h e following aspects affect the type, a m o u n t and complexity of data to be returned in response to a query, and also the w a y in which it is presented, and must be amenable to user preferences: content language presentation language type mode level.
new linguistic resources

of

inessentials, mainly in computational environment and. selection of data types to

2.3. Exploiting

Computer-assisted compiling and online dictionaries offer the lexicographer the o p p o r t u n i t y of creating a m u c h fuller, m o r e a c c u r a t e and e a s i e r to use dictionary, whether it is monolingual or bilingual. A s already noted, C D - R O M versions of print dictionaries d o not (and cannot) take full a d v a n t a g e of the e l e c t r o n i c m e d i u m . H o w e v e r , t h e r e are a l r e a d y in e x i s t e n c e a n u m b e r of t e c h n i q u e s and functions w h i c h must b e exploited in our d i c t i o n a r y of the future; these will not be discussed in detail here. T h e y include the use of c o r p u s

16

B.T.S. Atkins

analysis during the editing process, and the accessing of corpus citations from the appropriate dictionary sense by the user of the dictionary. Similarly, not all the types of information which the new dictionary will offer will be the subject of our discussion. An online dictionary will naturally include all the t y p e s of i n f o r m a t i o n a l r e a d y a v a i l a b l e , albeit selectively, in current dictionaries (see T a b l e 2 for the bilingual dictionary list, but there are others currently i m p l e m e n t e d , such as corpus frequency information in some learners' dictionaries of E n g l i s h ) , a n d these will not be discussed further in this paper. H e r e , I shall c o n s i d e r only one of the major lexicographical resources which t o m o r r o w ' s electronic dictionaries m u s t exploit: the g r o w i n g b o d y of relevant theoretical linguistic work. T h e t y p e of l e x i c o g r a p h i c a l a n a l y s i s that has b e e n i m p l e m e n t e d in the p r o t o t y p e Dictionary of the Future d e v i s e d by Atkins et al. (1994) and in its bilingual successor (Atkins et al. (1996)) w a s based on the principles discussed in Atkins, Fillmore and Heid (1995). T h e technique was pioneered in D E L I S , and is d e s c r i b e d in H e i d and Kriiger ( 1 9 9 4 ) . S p a c e c o n s i d e r a t i o n s prevent a detailed account here of the analysis of the motion frame which gave rise to the p r o t o t y p e entry for the E n g l i s h verb crawl in the first hypertext dictionary and the French verb ramper in the second. H o w e v e r , in order to introduce the d e m o as of t h e p r o t o t y p e of t o m o r r o w ' s b i l i n g u a l d i c t i o n a r y , I i n c l u d e n o w , from the hypertext entry for crawl.
2 1

e x e m p l i f i c a t i o n , s o m e brief extracts from the w o r k on the motion frame, and

3. C r e a t i n g m o n o l i n g u a l d a t a b a s e s

T h e lexicographical analysis resulting in the monolingual database is a threefold o p e r a t i o n : (1) the description of the frame; (2) the compilation of the lexical entries; and (3) the c o m p i l a t i o n of the thesaurus, involving a feature analysis of the l e m m a s in each frame. T h e s e are briefly outlined below.
3.1. Description of frame

T h e principal steps in this stage of the lexicography are: 1. Selection of semantic d o m a i n to w o r k on, and identification of frames to be described within the domain. E x a m p l e In the s e m a n t i c d o m a i n of s p a c e , one might expect to describe a m o n g s t other frames the frame of motion, perhaps in terms of the subframes of l o c o m o t i o n , positional c h a n g e etc.; within l o c o m o t i o n itself one m i g h t wish to distinguish the subframes of m a n n e r of motion (crawl, limp), speed

Bilingual Dictionaries - Past, Present and Future

17

of

motion

(dash,

amble),

sound

of

motion

(clatter,

roar)

and offrante

so on. 2. Preliminary description of frame a n d compilation of w o r k i n g list elements described. Table 3 shows a list of the motion frame e l e m e n t s currently used in the analysis of crawl, verb and noun, together with corpus e x a m p l e s in which the expression instantiating the frame element is capitalized. Frame Element
MOVER AREA PATH SOURCE GOAL DISTANCE MANNER SOUND SPEED VEHICLE MEDIUM EVENT

with w h i c h t h e v e r b s ' b e h a v i o u r m a y b e c o m p r e h e n s i v e l y

Corpus Sentence THE SURVEYOR will... crawl into the loft.


Some bees were already crawling OVER THE EARLY CLOVER. It can only escape by crawling ALONG A NARROW CHANNEL. Exhausted fugitives crawl FROM THE LAKE. She was crawling INTO THE TENT when she heard the sound. It took him fifty minutes to crawl FIFTY YARDS. He crawled ON TOES AND ELBOWS round the Land-Rover.

I pictured them crawling SILENTLY through the mud.


I crawled SMARTLY after him.

We crawled through the city IN HIS CAR. You crawl ON THE GROUND looking for worms.
It was A LONG CRAWL back to where he had left the tent.

Table 3. Expression of motion frame element in context of crawl T h e motion frame e l e m e n t s w e r e identified by m e a n s of an analysis of a number of sentence subcorpora, sets of sentences containing a representative lexical u n i t
22

e v o k i n g the frame (for instance, a h i g h - f r e q u e n c y verb or

nominalization). First, the frame e l e m e n t s were identified in the sentences; next each w a s associated with its instantiating sentence constituent a n d the grammatical phrase type and sentence function of each w a s noted. E x a m p l e Figure 3 shows the links between frame elements and their lexical and grammatical realizations in a corpus sentence; each complex description constitutes one valence formula.

18

B.T.S. Atkins

Figure 3. Motion frame elements: t w o realizations

3 . L i s t i n g of l e m m a s which (in one or m o r e of their m e a n i n g s ) e v o k e this frame, and h e n c e for which lexical entries are to b e written in terms of the elements of the frame. E x a m p l e T h e list of verbs e v o k i n g the motion frame w o u l d run to m a n y h u n d r e d s , of which s o m e e x a m p l e s are walk, run, swim, fly, pass, go, leave and so on. 4. F o r each of the l e m m a s , an analysis of the corpus data and recording of the w a y in which the frame elements are expressed in the context of each lemma. S e e T a b l e 3, and Figure 3. S u c h an analysis normally results in a refinement of the preliminary frame description, as new p h e n o m e n a are discovered. 5. Listing of its valence (a) a Frame Element formulas (see Figure 3) for each of the lexical units elements analysed. A valence formula comprises Group ( F E G ) , that is, a configuration of frame that c o - o c c u r in a given structure (e.g. phrase, clause, sentence) headed by that l e m m a (see the e x a m p l e s e n t e n c e in Figure 3 , w h i c h realizes the F E G 'MOVER, G O A L ' ) ; and, for each frame element in the group, come,

Bilingual Dictionaries - Past, Present and Future

19

(b) a specification of sortal features (c) its possible grammatical constitute its valence

(indicating the 'selectional' properties of

the constituents that can instantiate it); and realization. T h e g r o u p of v a l e n c e formulas associated with o n e sense of a l e m m a description. Example Table 4 shows a valence formula for crawl. The surveyor will crawl into the loft.
Valence formula: [MOVER / subject / N P / person] crawl [Goal / Adjunct / PP-in / direction] Frame element Grammatical function Phrase type Sortal features
MOVER

GOAL

subject NP person

adjunct PP-in direction

Table 4. A valence formula for crawl [2c]

6 . Refinement

of t h e frame

description,

and definition

of t h e f o r m a l

metalanguage (frame element n a m e s , grammatical codes etc.) to be used for the description of phenomena within the frame.
3.2. Compilation of lexical entries

This stage of the lexicography involves the following tasks, in respect of each of the lexical units listed in 3.1(3): 1. Scrutiny of c o r p u s sentences, working with (i) the description of the frame and a checklist of frame elements finalized under 3.1(6) (see Table 3), and (ii) the list of F E G s identified in the representative lexical units analysed under 3.1(4) (described at point 5a above, and see Figure 3). 2 . For each sentence: identification of f r a m e elements realized b y its constituents, a n d markup of valence formulas (see point 5 a b o v e , a n d T a b l e 4), associating each e l e m e n t with its appropriate sortal feature(s) a n d its grammatical realization in the sentence. 3 . Post-editing of computationally extracted valence description (set of valence formulas, see T a b l e 4) for the lexical unit (i.e. the l e m m a , o r h e a d w o r d , in that particular sense), each formula linked to the annotated corpus sentences from which it w a s derived, and other sentences assigned to that lexical unit from the corpus sentences including that lemma. T h e valence description of

20

B.T.S.Atkins

that lexical unit f o r m s part of the database, and dictionary, entry for the word. 4 . W h e n all senses of a l e m m a h a v e been a n a l y s e d , write definitions, complete the various sections of the entry, and draw up the semantic network of that lemma. Example Figure 4 s h o w s the semantic n e t w o r k for crawl, page). In the d i a g r a m in F i g u r e 4 , each sense is (i) numbered in such a way as to reflect s e m a n t i c r e l a t i o n s h i p s , and (ii) a s s i g n e d a m n e m o n i c hypertext entry. T h e m e a n i n g s discerned for crawl, ('[humans]', ' [ p l a n t ] ' , ' [ t i m e ] ' and s o o n ) to h e l p users to navigate m o r e easily round the after the study of over 7 0 0 sentences from the O U P current English corpus, are illustrated by the following corpus citations, each linked to one of the sense identifiers:
[1] [ 1 a] [1a1] [lata] [1 a2] A ladybird crawled up a dry stalk. The feeling of insects crawling on the skin ... By the time he got back, our room was crawling with He is simply crawling with money nowadays. Its members had been crawling inside details of federal
2 4 23

verb and noun (see the next

cops. grant

[2] [2a] [2b] [2b 1] [2c] [2c 1] [2d] [2dl] [2d2] [2d2a [2d3] [2d4] [2d4a] [3]

programmes. I spent ages crawling around the hotel's foundations. Fit stair gates before your baby starts crawling. He is so weak he has to crawl upstairs. Too tired to do anything more, he crawled into bed. The surveyor will pull up carpets and crawl into the loft. Let's stop trying to get women to support us by crawling to them. Dark heavy clouds were crawling across the sky. hugging the road that crawls around the mountains ... In the blackout the train crawled exasperatingly. He dipped his headlights and began to crawl round the bends. We watched the wide waves crawling in from the Atlantic. She was having friendly chats as she crawled down the list. The days before Christmas seemed to crawl past. He looked at the dark green ivy crawling up the walls.

Figure 4. Semantic network for crawl, verb and noun

In Figure 4, the shading of the central rectangle indicates that the m e a n i n g distinction described there is so general as not to b e directly lexicalized, but gives rise to all the senses developing from it. T h e 'core' meaning of crawl, shaded rectangle. Rectangles with rounded corners and bold lines refer to 'literal uses' of this word: the first division is 'primary means of locomotion' (for snakes, b u g s and verb and noun, m a y be thought of as the tripartite s e n s e contained in the centre

22

B.T.S. Atkins

o t h e r creatures) v e r s u s 'secondary m e a n s ' (for h u m a n b e i n g s , w h o s e natural m e a n s is w a l k i n g ) , the latter being further subdivided according to the reason behind the adoption of a secondary means of moving. Regular r e c t a n g l e s with roman type refer to 'extended senses' of the word crawl, and these e x t e n d e d senses develop from different literal uses: the network d i a g r a m was devised in order to show these relationships. In addition, although t h e r e is not r o o m on the d i a g r a m to include these, the various lines linking m e a n i n g s can e a c h b e labelled, a c c o r d i n g to the type of s e m a n t i c c h a n g e i n v o l v e d . T h e label for the line linking [2c: deliberate] to [2c 1: g r o v e l ] is 'Metaphor'; that linking [2d2: vehicle] to [2d2a: rider] is ' M e t o n y m y : riders as their vehicles' (see b e l o w for examples sentences illustrating the various uses). Regular rectangles with italic typeface refer to idioms. O u r design assumes an Idioms Database, with hypertext links from each item there to the appropriate sense of the various c o m p o n e n t words.
3.3. The compilation of the thesaurus

C o m p i l i n g the thesaural sections of the new dictionary involves (i) the selection of a frame to work on; (ii) a feature analysis of each of the w o r d s which evoke t h a t f r a m e . This is not to claim any theoretical value for a decompositional a p p r o a c h to word m e a n i n g ; however, in the prototype dictionary it has proved a useful method of differentiating a m o n g semantic neighbours, in this case the coh y p o n y m s of the verb move. feature s e t s
26 25

In the prototype dictionary, c o m p a r i n g verbs and

is an interactive process: the lexicographers' task is to compile the

feature set for each verb (and noun, and adjective etc.) in the frame, ensuring that the contrastive descriptions which result from this reflect not only the native s p e a k e r ' s intuition a b o u t the core sense of the word but also insights gained from a study of the w o r d ' s behaviour in the corpus evidence. T h e c o m p o n e n t s of m e a n i n g of each verb are r e c o r d e d in the form of s e m a n t i c features a t t a c h e d to the elements of the frame. T h u s , in the case of crawl, for instance, the MANNER frame element (see Table 3 for the elements so far discerned for the m o t i o n frame) is noted as being 'body-angle: horizontal', M E D I U M as being 'ground', rather than 'air' or 'liquid', SPEED as being typically 'slow', and so on. In this approach, a verb may be marked or unmarked in respect of a n y frame e l e m e n t ; if m a r k e d , then the options d e p e n d on the element in question. For instance, crawl is marked for MANNER (you cannot crawl upright, or erect, or on tiptoes); enter, on the other hand, is not marked for MANNER (you can enter s o m e w h e r e erect or on all fours, gracefully or a w k w a r d l y , and so on). V e r b s like sidle or crabcrawl are marked for PATH (which is lateral, rather than forward, backward, up, d o w n , etc.); verbs like crawl, enter and swim are not.

Bilingual Dictionaries - Past, Present and Future


Semantic feature crawl
EVENT EVENT EVENT EVENT MANNER MANNER MANNER MANNER MANNER MANNER MEDIUM MEDIUM MEDIUM SOUND SOUND SPEED SPEED

23

creep
X

Verbs amble wriggle


X X

swim
X

fly
X

continuous non-continuous salient non-salient body-angle: horizontal body-angle: vertical surface contact: constant surface contact: low motion: autonomous motion: non-auton. air ground liquid level: loud level: soft level: fast level: slow

X X X X X X X X X X X X X X

X X X X X X

Table 5. Feature-based contrasts

Table 5 shows a partial contrastive analysis based oh s o m e of the features which are used to differentiate the m e a n i n g of verbs e v o k i n g the frame of motion. In the hypertext dictionary, the process is a dynamic one. Words may be contrasted a c c o r d i n g to their semantic features (as in T a b l e 5), but it is also possible to submit an arbitrary group of semantic features and receive a listing of the verbs whose meaning incorporates them.
4. C r e a t i n g d i c t i o n a r i e s

A further

task for

the m o n o l i n g u a l

lexicographer

is to d e c i d e on

the

functionality required in the monolingual dictionaries to be extracted from the database (see F i g u r e 2), and to set up the hypertext links, design the screen displays, and c o m p i l e the metalinguistic explanations and user guidelines for each function and e a c h display. Because of the user customization imperative (see 2.2.3), it is p l a n n e d to offer for m o s t types of information (definition, syntax, examples, etc.) levels of complexity and amounts of data which depend on the u s e r ' s declared objective in using the dictionary, standard of c o m p e t e n c e in language and d e g r e e of interest in the dictionary contents. Consequently, the editorial design a n d the lexicography n e e d e d to i m p l e m e n t that design are extremely detailed and complex. The bilingual lexicography is (as it always is) vastly more complicated than the.monolingual. Bilingual and monolingual lexicographers must work together if it is decided to produce a version of a monolingual dictionary for users w h o s e

24

B.T.S.Atkins

m o t h e r t o n g u e is n o t the content l a n g u a g e . F o r these users, metalinguistic e x p l a n a t i o n s , user g u i d e l i n e s and even conceivably the definitions t h e m s e l v e s must be translated. W h e n the dictionaries being compiled from the databases are to be contrast dictionaries or e q u i v a l e n c e dictionaries, then the functionality b e c o m e s even m o r e c o m p l e x . T h e d e s i g n e r s ' tasks are to ask themselves what would users of various degrees of c o m p e t e n c e , with different objectives and needs, want from this resource? H o w is the best w a y to display the information without s w a m p i n g the reader? H o w best can the user customize each aspect of the dictionary? T h e p l a n n i n g s t a g e will i n v o l v e l e x i c o g r a p h e r s , linguists, computational linguists and c o m p u t e r scientists. C r e a t i n g the dictionaries by establishing the hypertext links, and writing t h e e x p l a n a t i o n s and guidelines, is the task of the bilingual lexicographers. T h e y h a v e to study the contents of the two databases, to decide w h e t h e r Item X in D a t a b a s e A and Item Y in Database B should be linked as e q u i v a l e n t s ; to select and m a n i p u l a t e all the other types of information to be extracted from the d a t a b a s e s and edited into the dictionaries. All the types of information listed in T a b l e 2 will obviously be included, but there will of course b e an added dimension of thesaural cross-linguistic contrasts and equivalences.
5. E n v o i

W e h a v e at our d i s p o s a l the k n o w l e d g e to plan, and the c o m p u t a t i o n a l and linguistic capabilities to i m p l e m e n t , a radically new type of bilingual dictionary. It will demand more of the lexicographers, more energy for sifting and lexicographical evidence and m o r e i n t e l l e c t u a l effort to u n d e r s t a n d

systematize what is found there. It will require the collaboration of linguists and linguistically aware c o m p u t e r scientists, and can be produced only if there is a continuous and efficient dialogue between them and the lexicographical team. It will undoubtedly c o s t m o r e initially than any standard print dictionary. But in this forum, if not yet in p u b l i s h e r s ' p l a n n i n g meetings, let us look beyond the currently possible a n d set o u r sights on the distant ideal. A demonstration will b e given of a p r o t o t y p e dictionary of the future (Atkins et al (1996)), conceived as a multilingual h y p e r t e x t dictionary, which will subsequently b e available for consultation on the W o r l d W i d e W e b .
2 7

Notes
1

My analysis of the bilingual dictionary entry was carried out within the EC Compass project (LRE 62-080) and is taken from Deliverable 24 of that project: Adapting Bilingual Dictionaries for On-line Comprehension Assistance, Atkins et al (1996).

Bilingual Dictionaries - Past, Present and Future


2

25

The names of the data types are taken from Compass Deliverable I: Terminology for Bilingual Dictionaries in Computational Lexicography, Elisabeth Breidt. This encompasses the maximal entry in a bidirectional bifunctional dictionary, i.e. one designed to be used by speakers of either of its two languages, for encoding or decoding (see Al (1983)), and consequently highly redundant for any particular user. Individual dictionaries vary of course from this model (for instance, in the subset of the data types which constitute an entry, or the choice of SL or TL for metalanguage), but where the book is to be sold in two markets most standard bilingual dictionaries offer most of these data types, and overall hold much the same types of information. This assumes that every design decision made for the hypothetical dictionary is the best possible, and that the editorial policy was carried out during the editing process in the best possible way. The terms lexical item and item in this sense are intended to cover both single- and multi-word expressions. See Atkins and Varantola (in press, and in preparation). These are the corpus resources used by the lexicographers of Oxford University Press, and include the British National Corpus, the various corpora created by the OUP reading programme, and historical corpora. My thanks to John Simpson for these examples. For the reader who needs elucidation, two further citations from the same corpus might be helpful: To bobbin - short for to depenistrate, and Last month, a Taiwanese wife bobbitted her husband with a pair of scissors after learning of his affairs with other women. One of the seminal works is Apresjan (1973); more recently, see Nunberg and Zaenen (1992); for a study of this phenomenon in the context of computational lexicography, see Copestake and Briscoe (1995). The expression 'lexical implication rule' was coined in Ostler and Atkins (1992). . My thanks to Rosamund Moon, who drew my attention to these examples. " See Marello ( 1989) Part I, Chapter 2, for an excellent discussion of this, also Zgusta (1984), Snell-Hornby (1984, 1986 and 1990), and Duval (1991). Krista Varantola (personal communication) points out however that lexicographers are often better linguists than the person using the dictionary, and care must be taken to avoid abdication of responsibility towards the less skilled dictionary user. The advanced dictionary users of course are those who will benefit from selective access to corpus data (see Varantola (1994)). The OHFD entry for column contains two senses ('gen colonne/" and 'Journ rubrique/"); the entry in the Concise'Oxford Dictionary (1995), with a similar-sized headword list, is set out in six senses, three of which are further subdivided. See Kromann (1989), and Kromann et al. (1984, 1989) for further discussion of this. Compiling entries for'words in semantic sets entails an additional pass through the wordlist, greatly increasing the time and expense of dictionary production. For instance, the English adjective civil would require to be compiled in the 'Military', and the 'Social Behaviour' sets, as well as figuring in compounds like civil servant and civil engineering, which themselves belong to different semantic sets. When all such uses had been compiled individually, the final version of the entry would have to be assembled. Reducing this to the correct length might then have a knock-on effect on the various sets involved. Editors have nightmares of an infinite loop.
10 12 13 14

15

16

See Fillmore (1985, 1993a and b, 1995), Fillmore and Atkins (1992, 1994) and Atkins (1995) for a discussion of frame semantics and its application to lexicographical analysis. Preliminary budgeting suggests that a monolingual hypertext dictionary of the type discussed here would be equivalent in editorial costs to a similar very large multivolume monolingual

26

B.T.S. Atkins scholarly dictionary. Such works are never undertaken for commercial reasons. Bilingual and multilingual versions would be proportionately more expensive. The results of a frame-semantics-based multilingual analysis may be seen in the prototype fivelanguages lexicon of Perception and Speech Act verbs produced during the DELIS project and described in Heid and Kruger (1996), while the Dictionary of the Future presentation (see Atkins et al (1995)) demonstrated a prototype entry in a multidimensional hypertext dictionary. Since adding a presentation language to the multilingual database will involve a lot of work it has to be assumed that this situation is quite a long way into the future. This topic is well discussed in the literature: see Hausmann (1977), AI (1983), Kromann (1987) and Bogaards (1990), among others. The objective of the Compass Project (LRE 62-080), now successfully completed, was to develop the prototype of just such a dictionary; see Breidt and Feldweg (ms). See Cowie (forthcoming) for a discussion of the application of a frame-based approach to the analysis of idioms for lexicographical purposes. This term denotes a lemma in one of its meanings Each lexical unit may evoke a different frame and consequently a polysemous word is likely to participate in the analysis of many frames. The network is the idea of Charles Fillmore and this description of the meanings of crawl is to a considerable extent his work As the work in DELIS indicated (although this aspect was not fully developed during the project), wordclasses other than verbs also evoke frames; see Fillmore (1995) for a description of applying frame semantics to the analysis of nouns. So far, this operation has been performed only for verbs. My thanks go to Marie-Hlne Corrard, Ulrich Heid, Caria Marello and Krista Varantola for their valuable comments on an earlier version of this paper, and I acknowledge with gratitude the unique contribution to the design of the hypertext dictionary by J. B. Lowe, whose computational expertise called it into being, and Charles Fillmore, whose ideas it attempts to embody. The WWW version may be found at: http://www.linguistics.berkeley.edu/hyperdico/

17

18

19

2 0

21

2 2

23

2 4

25

2 6

2 7

Bibliography Al, B. P. F. (1983b). Dictionnaire Apresjan, J. D. (1973). Regular corpus lexicography, de thme et dictionnaire de version. Revue de

Phontique Applique, Vols. 66-68: pp. 203-211. polysemy. Linguistics, 142. Mouton, The Hague. a frame semantics approach 1994. to Atkins, B. T. S. (1995). Analyzing Twentieth the verbs of seeing:

in: S. Gahl, C. Johnson, A. Dolbey (eds.) Proceedings of the BLS,

Annual Meeting of the Berkeley Linguistic Society,

University of California, Berkeley, CA. Atkins, B. T. S., Charles J. Fillmore, John B. Lowe and Nancy Urban (1994) The Dictionary of the Future: A Hypertext Database. Presentation and on-line demonstration at the Xerox-Acquilex Symposium on the Dictionary of the Future, Uriage, France, (ms). Atkins, B. T. S., Charles J. Fillmore and Ulrich Heid (1995) Lexicographical in Corpus Evidence, Deliverable D-1X-2 of DELIS Project (LRE 61.034). Relevance

Bilingual Dictionaries - Past, Present and Future

27

Atkins, B. T. S., Charles J. Fillmore, John B. Lowe, Marie-Hlne Corrard and Nancy
Urban (1996) The Dictionary of the Future: A Multilingual Hypertext Dictionary.

Sample entry available on WWW. Atkins, B. T. S. & K. Varantola (in press): Monitoring Dictionary Journal of Lexicography. Atkins, B. T. S. & K. Varantola (in preparation): Language
Dictionaries: Dictionary The Final Use. quatre dictionnaires. Report of the EURALEX

Use, International Learners


Project

Using
into

and AILA Research

Bogaards, P. (1990a). Deux langues,

Lexicographica, 6. eds. F. M.

Dolezal et al, Niemeyer, Tbingen. Breidt, Elisabeth and Helmut Feldweg (unpublished ms.) Accessing with COMPASS, submitted to forthcoming special issue of Machine
Copestake, A. A. and E. J. Briscoe (1995) Regular polysemy

Foreign Languages Translation.


sense in

and semi-productive of phraseology,

extension,

in Journal of Semantics, Vol. 12, pp 1567.


theory and the analysis

Cowie, A. P. (forthcoming) Semantic frame

Proceedings of the Second International Symposium on Phraseology, Moscow, 2-4 April 1996, L. Minaeva (ed.). Moscow: Moscow State University.
Duval, A. (1991). L'quivalence dans le dictionnaire bilingue, in Hausmann, F. J.,

Reichmann, O., Wiegand, H. E., and Zgusta, L., eds., Wrterbcher. Ein . internationales Handbuch zur Lexikographie Band 3, Handbcher zur Sprach- und Kommunikationswissenschaft, 2817-2823. Berlin and New York. Fillmore, Charles J. (1985) Frames and the Semantics of Understanding, in Quaderni di Semantica VI:2.
Fillmore, Charles J. (1993a) 'Corpus Linguistics' or 'Computer-aided armchair

linguistics', in Nobel Symposium Proceedings. Stockholm. Fillmore, Charles J. (1993b) Frame semantics and perception verbs, in: Hans Kamp and James Pustejovsky (eds.), Universals in the Lexicon: At the Intersection of Lexical Semantic Theories, ms., Dagstuhl. Fillmore, Charles J. (1995) The Hard Road from Verbs to Nouns, in In Honor of William S-Y. Wang (eds. Matthew Chen and Ovid Tzeng): Pyramid Press Fillmore, Charles J. & B. T. S. Atkins (1992) Towards a Frame-based Lexicon: the Semantics of RISK and its Neighbors, in A. Lehrer & E. F. Kittay. (eds.), Frames, Fields and Contrasts: New Essays in Semantic and Lexical Organization, pp. 75-102. Lawrence Erlbaum Associates: Hillsdale, New Jersey. Fillmore, Charles J. & B.T.S. Atkins (1994) Starting where the Dictionaries Stop: the Challenge of Corpus Lexicography, in B. T. S. Atkins & A. Zampolli (eds.), Computational Approaches to the Lexicon, pp. 349-393. Oxford University Press, Oxford UK.
Hausmann, F. J. (1977). Einfhrung in die Benutzung der neufranzsischen

Wrterbcher,

in Romanistische Arbeitshefte, 19. Tbingen: Max Niemeyer Verlag.

28

B.T.S. Atkins F. J. (1988). Grundprobleme des zweisprachigen Wrterbuchs, in:

Hausmann,

Hyldgaard-Jensen, K. and Zettersten, A., (eds.), Symposium on Lexicography III. Proceedings of the Third Internationa] Symposium on Lexicography May 14-16, 1986 at the University of Copenhagen, Lexicographica Series Maior 19, 137-154, Tbingen. Heid, Ulrich and Katja Krger (1994) On the DELIS Corpus Evidence Encoding Scheme (CEES), Deliverable D-III-0 of DELIS (LRE 61.034), January 1994. Heid, Ulrich and Katja Krger (1996) A multilingual lexicon based on Frame Semantics, in Proceedings of AISB96 Workshop on Multilinguality in the Lexicon, (eds.) Lynne Cahill and Roger Evans. University of Sussex, UK. Heid, Ulrich (1994) Contrastive Classes - Relating Monolingual Dictionaries to Build an MT Dictionary in: Ferenc Kiefer, Gabor Kiss, Julia Pajzs (eds.), Papers in Computational Lexicography Complex '94, (Budapest: Linguistics Institute, Hungarian Academy of Sciences), 115-126.
Kromann, H.-P. (1987) Zur Typologie und Darbietung der Phraseologismen in

bersetzungswrterbchern, In. Korhonen, J. (ed.) Beitrge zur allgemeinen und germanistischen Phraseologieforschung: Internationales Symposium in Oulu, Juni 1986. Oulu: Universitt Oulu.
Kromann, H.-P. (1989). Neue funktionellen zwei-sprachigen Orientierung der zweisprachigen Wrterbcher. Zur Lexikographie. In: Snell-Hornby, M. and Pohl, E.,

(eds.), Translation and Lexicography, Paintbrush. Kromann, H.-P., Ruber, T., and Rosbach, P. (1984). 'Active' and 'passive'
dictionaries: theScerba concept reconsidered.

bilingual

In: Hartmann, R. R. K., (ed.), LEXeter

'83 Proceedings. Papers from the International Conference on Lexicography at Exeter, 9-12 September, 1983, Lexicographica Series Maior 1, 207-215, Tbingen. Niemeyer. Kromann, H.-P., R u b e r , T., and Rosbach, P. (1989). Principles of bilingual lexicography. In: Hausmann, F. J., O. Reichmann, H. E. Wiegand and L Zgusta (eds.) Dictionaries, Dictionnaires, Wrterbcher, Ein internationales Handbuch, 2711-2728. Berlin: de Gruyter. Marello, C. (1989). Dizionari bilingui. Zanichelli, Bologna. Nunberg, G. and Zaenen, A. (1992). Systematic polysemy in lexicology and lexicography. In: Varantola, K., Tommola, H., Salmi-Tolonen, T., and Schopp, J., eds., The Proceedings of EURALEX '92, Tampere, Finland. Department of Translation Studies, University of Tampere.
Snell-Hornby, M. (1984). The bilingual dictionary help or hindrance? In Hartmann, R.

R. K., (ed.), LEXeter '83 Proceedings. Papers from the International Conference on Lexicography at Exeter, 9-12 September, 1983, Lexicographica Series Maior 1, Tbingen. Niemeyer.

Bilingual Dictionaries - Past, Present and Future Snell-Hornby, M. (1986). The bilingual dictionary victim of its own tradition.

29 In:

Hartmann, R., ed., The History of Lexicography. John Benjamins, Amsterdam and Philadelphia.
Snell-Hornby, M. (1990). Dynamics in meaning as a problem for bilingual

lexicography. In: Tomaszczyk, J. and Lewandowska-Tomaszczyk, B., eds., Meaning and Lexicography (John Benjamins Publishing Company). Varantola, Krista (1994) The Dictionary User as Decision Maker, in Martin, W., W. Meijs, M. Moerland, E. ten Pas, P. van Sterkenburg and P. Vossen (eds.), Euralex 1994 Proceedings, Amsterdam: Vrije Universiteit Amsterdam, pp. 606-611.
Zgusta, L. (1984). Translation equivalence in the bilingual dictionary. In: Hartmann, R.

R. K., ed., LEXeter '83 Proceedings. Papers from the International Conference on Lexicography at Exeter, 9-12 September, 1983, Lexicographica Series Maior 1, Tubingen. Niemeyer.