Sie sind auf Seite 1von 7

Creating a termbase.

Terminology Managemen t Report


Juan Yborra Golpe

This is a report on how we elaborated a termbase for a Terminology Management module at Swansea University. The first step we took was to read the book tilted: Terminology: Theory, methods and applications, by Mara Teresa Cabr. The book was extremely helpful, as it explained how to build up a professional termbase step by step. Apart from this, we also made use of the PowerPoint presentations we were given in the module. We were requested to develop two monolingual termbases (one in English and the other one in Spanish), along with a correspondence record for the terms. The main problem we had to face was the departure of one of the members of the group, what meant that only two people were making all the work. Luckily, we were told that due to these special circumstances, we did not need to use as many terms as the other groups when making the termbase. What we first did was choosing the domain we were going to work on. After some discussion and proposals, we eventually choose the Vestibular system as our research domain. Division of work Once we reached that point, we decided to carry out a small experiment. My team mate would perform the English term search on her own whereas I would also do the same with the Spanish ones. A couple of days later, we met again to check our results and we were surprised that almost 50% of the terms we had extracted were the same ones although obviously, in different languages. After that, we decided to divide the work and do it during the holydays when we had more spare time.

Creating the termbase


Documentation
First of all, we reached the conclusion that carrying out a good work when documenting ourselves that is, choosing trustable resources to build up the textual corpus, was crucial.

In order to achieve this, we agreed that the best way to find reliable medical journals was through official organisms. As I have been a student at the University of Granada for a few years, I know its electronic library quite well and I know there are many prestigious journals that can be accessed to. Thus, we used a VPN (Virtual Private Network) connection and entered the electronic library of the University. Once there, we started our search looking for English articles and journals. Then, we used the PubMed database to access the MEDLINE (Medical Literature Analysis and Retrieval System) bibliographic database, one of the best for life sciences and biomedical information. Thanks to it, we found 4-5 extremely useful articles from prestigious journal for our work. Unfortunately, it got notably harder when having to find Spanish articles as the number of articles about the vestibular system in Spanish is low or at least, not as huge as in English. We also had to change the databases, for the previous ones were for English documents only. Luckily, we found a quite useful journal: Revista de otorrinolaringologia y cirugia de cabeza y cuello, hosted in Scielo, (Scientific Electronic Library Online) a trustable bibliographic database supported by the State of Sao Paulo (Brazil) that contained hundreds of scientific journals in English, Spanish and Portuguese. Thanks to this exhaustive research, we gathered a good number (approx. 45) of both English and Spanish texts and, what is more important, we managed to elaborate a greatly trustable textual corpus for our termbase.

Software choices
Prior to the extraction of terms from the textual corpus, we decided to establish which software we were going to use. I had previous experience at the University of Granada with SDL Multiterm so; we chose to use it for building up the termbase. As for the extraction process, we agreed to use SDL Multiterm Extract because, as they are created by the same company, it would not be necessary to worry about any conversion/integration problems when exporting the extracted terms from the text corpus to the termbase creator (SDL Multiterm.)

Finally, we decided to use Microsoft Excel to create the correspondence records between terms for we both were familiar with its usage.

Extraction process
As it has been foresaid, we used SDL Multiterm Extract, which was already installed in Lab. C of the University, to extract all the terms needed for our termbases. Before all the extraction process took place, we had to convert the documents from pdf to a readable file because Multiterm Extract does not support pdf files (we decided to convert everything into a rtf file.) After that, a little bit of editing was needed, because Microsoft Word had placed a paragraph break at the end of every line. In order to solve this we just had to follow the instructions provided on one of the documents on Blackboard and Search/Replace the paragraph breaks by spaces. Once all the problems had been solved, we ran the extraction but we did not get as many results as expected so we tried to scarcely change the search parameters, allowing a little bit more noise in the extraction process. We received more feedback that time, so we carefully chose those terms that had an unambiguous equivalent in both languages, in order not to find troubles when creating the correspondence record later. We also chose to include synonyms, initialisms or terminological phrases in order to make the termbase more interesting and challenging. As for the examples of the term in actual use we had to make reference to in the termbase, we did not choose all the first entries given by the SDL Multiterm. We tried to use those which included examples of the other terms so as to create a strong network among all the terms by means of hyperlinks and cross-references. Finally, we exported the chosen terms selecting Tab delimited to a Word file and removed the non-validated candidate terms. This was the last step done before the holidays, as we noticed that whereas we found no trouble whatsoever when using the demo version of Multiterm to create the termbase, with the demo version of Multiterm Extract we were not allowed to export the terms once extracted, so we had to do it by using the campus facilities.

Recording the terms into a termbase


After all this previous work, we proceeded to record the extracted terms into the two monolingual termbases in SDL Multiterm. Our main aim was to create an extremely compact and linked termbase, where every entry in the termbase would have at least one cross-reference to any of the other terms. We were asked to build up a termbase that specified the following information: Citation form: Almost all entries were nouns and we have little problem here because all the terms we chose were present in the texts in a proper form according to the conventions for lexicography (e.g. nouns in the singular, adjectives in the singular masculine form in languages with gender marking on adjectives , etc.) Definition: We needed to provide a proper terminological definition. In order to achieve that, we used online medical term dictionaries and the information we found in the text corpus. One of our concerns was also avoiding circular definitions and entries, as we read in Cabres book: Definitions should not be circular: dense: having relatively high density density: the quality or condition of being dense To avoid this we made sure that definitions used known words and in the case a more specific word is used, this word must be a term defined in the same termbase, for example: utricle: The part of the vestibule of the ear into which the semicircular canals open. vestibule: The parts of the membranous labyrinth comprising the utricle and the saccule and contained in the cavity of the bony labyrinth. Notes on the definition: We had to give the sources of the definition and comment the degree of authority, but as we have said before, we were extremely careful during the documentation step.

Examples: Of the term in actual use. We provided examples and besides, we tried to find examples in which other terms of the termbase appeared. Semantic relationships: Most of them were holonymy/meronymy relationships but for one case in which we found a hyperonymy/homonymy one. It was impossible for us to semantically relate all the words in the termbase, although we achieved to do that in most of them. Anyway, we managed to relate all the terms thanks to its definitions where at least one of the terms in the termbase was present, thus developing an interesting network. Synonyms: We included in our termbase synonyms and initialisms, indicating which ones are predominant and why. Regarding the software, everything worked quite well but for truth to be said- some minor problems with SDL Multiterm. We found some trouble when creating the cross-references and hyperlinks until we realised that, in order to be able to create them we had to do it in the editing mode, but without opening the slot where the word we were going to use to create the hyperlink was. Apart from that, one of the worst things SDL Multiterm has is the almost impossibility of editing the parameters and field labels after you have created a new termbase. Apart from that editing problem, SDL Multiterm is extremely efficient and useful for a terminologist in our criterion.

Correspondence Records
We used Microsoft Excel to create the Correspondence Record. We used three columns, the first one with the English term, the second one with the Spanish term and the last one with notes on the differences in correspondence between the terms. As we had previously chosen similar terms in both languages, we did not find many problems, just some details that we mentioned in the notes column.

Conclusion
Although it has been a long and hard job, we tried to do it the best we could by following a meticulous process. We started with an in-depth search of trustable sources during the documentation phase, and then followed with a careful choice of the terms to be extracted during the extraction. The last challenge came when trying to link all the terms in the termbase and thus

create a good network of terms of the same domain. And what is more important, I learned a lot through the process and I am sure it will be really helpful for me and my future.

Das könnte Ihnen auch gefallen