Sie sind auf Seite 1von 5

The OEC: Facts about the language

The 20-volume historical Oxford English Dictionary is the largest record of words used in English, past and present. It contains words that are now obsolete or rare (such as xenagogue 'a person who guides strangers' and vicine 'neighbouring or ad acent'! in addition to the latest coinages such as phishing and podcast. The second edition of the OED, published in "#$# and consisting of twent% volumes, contains more than &"',000 entries, and the third, available online, is e(panding all the time, with batches of 2,'00 new and revised words and phrases being added in regular )uarterl% updates. How many words are there in English? It is a )uestion often as*ed, but not so easil% answered. Even the OED does not set out to include ever% speciali+ed technical term or slang or dialect e(pression ever used. ,ew words are constantl% being invented, developed from e(isting words, or adopted from other languages. -ost will be used rarel%, or onl% b% a small group of people. This means that an unlimited number of words ma% occur in speech and writing which will never be recorded in even the largest dictionar%. .urthermore, what e(actl% is a word/ 0learl% we should include single units such as cat and dog. 1ut are the plurals catsand dogs separate words/ 2hould we include compounds such as walking stick, which are made up of two e(isting words/ There are an almost unlimited number of such two-word compounds, which can't all be included in a dictionar%. 3nd what about abbreviations li*e BBC and Dr, or proper names such as London, Nelson, and Harry Potter4 are the% words/ 3s %ou can see, the )uestion is not a straightforward one. How many words do we use? 3lthough it ma% be impossible to *now the number of words in English, the 5(ford English 0orpus can help us assess the number of words in current use. Instead of tal*ing about words, it's more useful in this conte(t to tal* about le as, a lemma being the

base form of a word. .or e(ample, cli !s, cli !ing, and cli !ed are all e(amples of the one lemma cli !. 6ust ten different lemmas (the,!e, to, of, and, a, in, that, have, and "! account for a remar*able 2'7 of all

the words used in the 5(ford English 0orpus. If %ou were to read through the corpus, one word in four (ignoring proper names! would be an e(ample of one of these ten lemmas. 2imilarl%, the "00 most common lemmas account for '07 of the corpus, and the ",000 most common lemmas account for 8'7. 1ut to account for #07 of the corpus %ou would need a vocabular% of 8,000 lemmas, and to get to #'7 the figure would be around '0,000 lemmas. The remaining '7 of the corpus consists of a ver% large number of lemmas which occur rarel%4 words li*e oidore orparados, which ma% occur onl% once ever% several million words. 9i*e all natural

languages, English consists of a small number of ver% common words, a larger number of intermediate ones, and then an indefinitel% long 'tail' of ver% rare terms.

Vocabulary size (no. lemmas) 10 100 1000 7000 50,000 !1,000,000

% of content in Example lemmas OEC 25% the, of, and, to, that, have 50% 75% 90% 95% 99% from, because, go, me, our, well, way girl, win, decide, huge, difficult, series tackle, peak, crude, purely, dude, modest saboteur, autocracy, caly , conformist laggardly, endobenthic, pomological

The long tail means that to account for ##7 of the 5(ford English 0orpus %ou would need a vocabular% of more than a million lemmas. This would include some words which ma% occur onl% once or twice in the whole corpus4 highl% technical terms li*e chrondrogenesis or dicar!oxylate, and one-off coinages li*e !ootlickingly or unsurfworthy that people would probabl% understand but would be unli*el% to use. If we decide that around #0-#'7 of the corpus gives a reasonable idea of an average vocabular%, we are left with a figure somewhere in the range of 8,000-'0,000 lemmas4 sa%, 2',000. :hat does a vocabular% of this si+e represent/ It represents the set of most significant words in English4 those which occur reasonabl% fre)uentl% and which account for all but a small part of ever%thing we ma% encounter in speech or writing. It includes all the words that we activel% use in general ever%da% life. It's interesting to note that most reasonabl% si+ed dictionaries contain significantl% more than 2',000 lemmas.The ""th edition of the Concise Oxford English Dictionary, for e(ample, lists more than 8',000 single-word lemmas, which means that the ma orit% of its entries must belong to the long tail of e(tremel% rare words. This ma*es good sense4 such terms occur ver% infre)uentl%, but when the% do the% are li*el% to be crucial to what's being said, and the reader might well want to loo* them up.The idea of a )uantifiable

vocabular% should be seen in this light4 the words we ignore for the purposes of the e(ercise ma% be ver% rare, but in conte(t the% ma% be ver% important. What is the commonest word? 1ased on the evidence of the 5(ford English 0orpus, which currentl% contains over 2 billion words, the "00 commonest English words found in writing around the world are as follows4

1 2 " # 5 $ 7 % 9 10 11 12 1" 1# 15 1$ 17 1% 19 20 21 22 2" 2# 25

the be to of and a in that have & it for not on with he as you do at this but his by from

2$ 27 2% 29 "0 "1 "2 "" "# "5 "$ "7 "% "9 #0 #1 #2 #" ## #5 #$ #7 #% #9 50

they we say her she or an will my one all would there their what so up out if about who get which go me

51 52 5" 5# 55 5$ 57 5% 59 $0 $1 $2 $" $# $5 $$ $7 $% $9 70 71 72 7" 7# 75

when make can like time no 'ust him know take people into year your good some could them see other than then now look only

7$ 77 7% 79 %0 %1 %2 %" %# %5 %$ %7 %% %9 90 91 92 9" 9# 95 9$ 97 9% 99 100

come its over think also back after use two how our work first well way even new want because any these give day most us

It's noticeable that man% of the most fre)uentl% used words are short ones whose main purpose is to oin other, longer words rather than determine the meaning of a sentence. These are *nown as 'function words'. It could be said that it's more interesting to e(plore the fre)uenc% of 'content words', as shown in the list below4

Nouns 1 2 " # 5 $ 7 time person year way day thing man 1 2 " # 5 $ 7

Verbs be have do say get make go 1 2 " # 5 $ 7

Adjecti es good new first last long great little

% 9 10 11 12 1" 1# 15 1$ 17 1% 19 20 21 22 2" 2# 25

world life hand part child eye woman place work week case point government company number group problem fact

% 9 10 11 12 1" 1# 15 1$ 17 1% 19 20 21 22 2" 2# 25

know take see come think look want give use find tell ask work seem feel try leave call

% 9 10 11 12 1" 1# 15 1$ 17 1% 19 20 21 22 2" 2# 25

own other old right big high different small large ne t early young important few public bad same able

Nouns The commonest nouns are ti e, person, and year, followed b% way and day ( onth is ;0th!. The ma orit% of the top 2' nouns ("'! are from 5ld English, and of the remainder, most came into medieval English from 5ld .rench, and before that from 9atin. ,otice that man% of these words are ver% common because the% have more than one meaning4 way and part, for e(ample, are listed in the Concise OED as having "$ and "& different meanings respectivel%. The% often also form part of common phrases4 some of the fre)uenc% of ti e, for e(ample, comes from its use in adverbial phrases li*e on ti e, in ti e,last ti e, next ti e, this ti e, etc. Verbs 3s %ou would e(pect, the commonest verbs e(press basic concepts. 2tri*ingl%, the 2' most fre)uent verbs are all one-s%llable words< the first two-s%llable verbs are !eco e (2&th! and include (28th!. 5f these 2', 20 are 5ld English words, and three more, get, see , and want, entered English from 5ld ,orse in the earl% medieval period. 5nl% try and use came from 5ld .rench. It seems that English prefers terse, ancient words to describe actions or occurrences. Adjectives 3gain, most of the top ad ectives are one-s%llable words, and "8 out of 2' derive from 5ld English4 onl% different, large, andi portant are from 9atin. In terms of the words' meanings, great is higher in the ran*ing than !ig, probabl% because of its informal sense 'ver% good'. Little is surprisingl% high at 8, as

compared with s all at "'. Bad is une(pectedl% low at 2=4 is this because we have such a large choice of s%non%ms available for e(pressing 'bad things'/

Das könnte Ihnen auch gefallen