Sie sind auf Seite 1von 13

 A corpus constitutes the raw textual material for various forms of linguistic analysis.

 E.g.: by analyzing a corpus of documents written by a specific author, objective data


can be gained about the use of certain words and phrases.
 E.g.: a lexicographer preparing a dictionary of teenagers’ slang will put together a body
of texts, mostly spoken language or scripts from Internet chat sessions. This corpus can
then be used to identify the entries, words or phrases for the dictionary, along with
examples of actual use.
 E.g.: French political scientists interested in finding out how much coverage their
country gets in America newspapers (as opposed to other countries, perhaps) might
turn to a corpus comprising American newspaper articles. The political scientists
would then start their search by counting the hits for certain keywords, such as France,
Paris, Chirac or de Gaulle.
 E.g.: For translators, corpora offer a wide variety of uses. The most helpful is probably
the change to test one’s own tentative translation (especially when translating into a
foreign language) against the background of a large selection of original text written in
the target language.
 E.g.: The parallel corpora of source texts and translations can also play an important role
in the formulation phase of a translation, in the field of translation memories, and
indeed in the scholarly analysis of translation process
 The different types of corpora are numerous as their functions. In order to differentiate
between them, we can refer to the following five categories:
 Medium of original representation: oral (need transcription tools) vs. written
 Medium of corpus representation: print vs. electronic (some also involves audio
and video, for verbal and non-verbal communication)
 Number of languages: monolingual (e.g. British National Corpus (BNC) at Oxford
University) vs. multilingual
 Characteristics of the selected texts: regional (international vs. regional), social
(children, teenage, men, women), historical variations (diachronic (development of
language over a time, e.g. Brown-Frown corpus) or synchronic (development of
language of a defined period, e.g. Collins Cobuild Corpus)
 Characteristics of text preparation: annotated vs. plain texts, statistical factors
 Corpus functions: linguistics, philology (the scientific study of the development of
language), lexicography, translation, computational linguistics and information
management.
 E.g.: For translators to check the validity of a given expression or phrase.
 E.g.: An archive of scientific texts is a valuable source of terminology for translators.
 It is not always possible to find existing corpora that contain texts covering the
specific area that a translator might specialized in. In this case a translator might
want to put together their own customized corpus.
 Given the number of texts available in digital form, a fair-sized corpus can be
compiles in quite short period of time. The Internet in particular – itself a mega-
corpus – can be a prime source of digital texts.
 In addition, many clients provide translators with a large amount of company and
product-related text information in digital form.
 In order to analyze a corpus quickly and to find certain expression, translators need
to have access to software that enables them to retrieve data efficiently. This
software is usually a text analysis program such as WordSmith, developed by Mike
Scott.
 Wordlist: A list of all the words occurring in the texts selected for analysis. The list
can be sorted either alphabetically or by frequency. In addition, statistical
information is available on word and sentence lengths, type-token ratios, etc.
 Keywords: This tool allows you to compare a word list from a shorter article with the
list from a larger text collection. This means you can find out which terms
(keywords) from one list are ‘most unusually frequent’ in the short text. These
keywords can then be used as indicators of the text’s content.
 Concord: A concordance shows the occurrence of a given search term in its textual
context (i.e. the words to its right and left). This is used by terminologists to extract
terms.
Type-token ratio

Description Translated texts Non-translated


texts
Types of conjunctions (a) 64 52
Tokens (running words) (b) 339,895 342,043
Type-token ratio (a/b * 100) 0.019 0.015

Comparing the conjunctions in more detail, there are 18 conjunctions that can be
found in the TT but not the NT. In reverse, there are only 6 conjunctions that can be
found in the NT but not the TT. Thus, here we find much more untypical
lexicogrammatical selections of conjunctions which are not adhering to the NT
norms.
No. Translated texts Non-translated texts
Conjunctions Frequency Conjunctions Frequency
1. 并bing[and] 1390 并bing[and] 1598
2. 如ru[if] 1242 如ru[if] 886
3. 而er[and] 1127 但dan[but] 780
4. 则ze[then] 1058 而er[and] 490
5. 但dan[but] 498 如果ruguo[if] 293
Total 5315 4047

These findings reveal that, generally, the usage of the types of


conjunctions is quite similar with the 4 conjunctions in the top-5
conjunctions occuring in both lists, where 并bing [and] and而er
[and] are considered additive, 如ru [if] is conditional: positive:
if…then, and但dan [but] is adversative. This could further imply
that the logical-semantic relations in both the IT texts are very
much relying on these binding functions.
non-translated
Conjunctions

Frequency in

Frequency in
translated

Keyness
texts

texts
No.

1. 则ze[then] 1058 572.5756 224


2. 而er[and] 1127 244.9664 490
3. 但是danshi[but] 238 176.3241 31
4. 只要zhiyao[if only] 210 140.0773 33
5. 以yi[so that] 420 139.325 141
Frequency in

Frequency in
Conjunction

translated

translated
Keyness

texts
texts

non-
No.

s
1. 但dan[but] 498 69.05212 780
2. 并bing[and] 1390 19.37121 1,598
3. 鉴于jianyu[in 7 13.94654 28
view of]
4. 以免 yimian[lest] 4 5.211603 13
No Correlative constructions Frequenc Freque
. y in ncy in
translate non-
d texts transla
ted
texts
1. 因yin[because]… 而er[and] 22 15
2. 由于youyu[due to]…而er[and] 20 7
3. 因为yinwei[because]…而er[and] 3 3
4. 由于youyu[due to]…则ze[then] 1 0
5. 如ru[if]…则ze[then] 769 131
6. 如果ruguo[if]…则ze[then] 166 43
No. Double conjunctions Frequen Frequen
cy in cy in
translat non-
ed texts translat
ed texts
13. 且qie[and] 如果ruguo[if] 22 0
14. 且qie[and] 若ruo[if] 1 0
15. 且qie[and] 无论如何wulunruhe[in any 3 0
case]
16. 且qie[and] 只要zhiyao[if only] 10 0
17. 同时tongshi[at the same time] 并 1 0
bing[and]
18. 因此yinci[therefore]如ru[if] 1 0
19. 但dan[but] 如ru[if] 22 26
No. Conjunctions Frequency Explicitatio Percenta
of n ge
conjunctio
ns

1. 并bing[and] 1390 154 11.08


2. 如ru[if] 1242 150 12.08
3. 而er[and] 1127 858 76.13
4. 则ze[then] 1058 1048 99.05
 Simplification…the idea that translators subconsciously simplify the language or
message or both
 Explicitation…the tendency to spell things out in translation, including, in its
simplest form, the practice of adding background information
 Normalization or conservatism…the tendency to conform to patterns and
practices that are typical of the target language, even to the point of exaggerating
them
 Levelling out…the tendency of translated text to gravitate around the centre of
any continuum rather than move towards the fringes
(Baker 1997:176-7)

Das könnte Ihnen auch gefallen