Sie sind auf Seite 1von 13

M. A.

in Applied Linguistics
MAAL6018
Vocabulary Teaching and Learning
Session 2 Vocabulary size, frequency, word lists, and computer applications

Classroom for this session: CRT6.32


Preparation for this session:
Please bring at least one piece of your writing of at least 300 words, and a few other texts of at least
300 words preferably in different genres (soft copies saved on a USB or your email) to this session.

Warming up
Number of word families
1.

Number of words in the English language:

2.

Number of words a university-educated native


English speaker knows:

3.

Number of words that you know:

4.

Vocabulary size needed for basic


communication (i.e., to express what one wants
to express, however simply):

5.

Vocabulary size needed for reading


(understanding any written text):

New curriculum proposed by Education Bureau


EMB/CUHK (headed by Arthur McNeill)s ongoing project to develop an English vocabulary
curriculum for 12 years of compulsory education in Hong Kong:
Key stage

Stage target

Cumulative target

KS1 (by Primary 3)


KS2 (by Primary 6)
KS3 (by Secondary 3)
KS4 (by Secondary 6)
Remark: knowing a word here means being able to recognize a word and tell its meaning

Words selected with reference to:


Frequency level in British National Corpus (BNC)
General Service List (GSL)
Teacher representatives then further selected words based on their judgement about suitability of
the words for learners according to:
Themes recommended in the Governments Curriculum Guides
Vocabulary content of approved textbooks
Other guidelines set by the research team (e.g. whether the words are used in Hong Kong, ease
for learning, etc.)

Vocabulary size and frequency

In this session, we will look at (i) the vocabulary of English and (ii) the English vocabulary of
individuals. There are two senses of the word vocabulary at work here: (i) the words that belong to a
language and (ii) the words that its individual speakers know. These senses are, of course, related.
For example, an individuals vocabulary size represents a proportion of the vocabulary of the
language as a whole. Similarly, the vocabulary of a language is, in principle, the same as the number
of words that its speakers collectively know.

We will not only be concerned with vocabulary size however, because some words are used far
more frequently than others. We will look at some estimates of word frequencies for English. We
will also look at some methods of estimating vocabulary size, which take account of frequency.

How many words?


The following are some answers to the first three questions in the quiz you did in Session 1. You will
readily see that the questions are themselves problematic in that they do not specify exactly what is
meant by a word. You may also think about the problems involved in saying exactly what it means
to know a word.
2

How many words are there in the English language?


According to Schmitt (2000, p.3), estimates in published work range from 1 million to 2 million due
to different definitions of what is meant by a word and hence how words are grouped into one
entry.

The second edition of the Oxford English Dictionary (1989) had 291,500 entries (in 20 volumes!).
Goulden, Nation & Read (1990) estimated that Websters Third New International Dictionary
(published in 1961) contained around 267,000 entries and 54,000 word families. The latest edition
of Websters Third New International Dictionary, unabridged, published in 2000, is believed to
contain over 472,000 entries.

Learners dictionaries typically contain far fewer words than this. For example, Oxford Advanced
Learner's Dictionary (2005) stated it had 183,500 British and American words, phrases, and
meanings, while The Longman Dictionary of Contemporary English Online (2009) stated it had over
207,000 words, phrases, and meanings.

How many words does an average native speaker of English know? And how many words
do you know?
The most reliable estimates we have say that a university-educated native speaker will know
around 20,000 word families (Nation, 2001). Many researchers believe that native speakers acquire
about 1,000 words for every year of their lives, and so at the age of 5, they have an average
vocabulary of 4,000-5,000 word families. But estimates may be unreliable for two reasons. First, it is
difficult to say exactly what we mean by knowing a word (see Different dimensions of vocabulary
knowledge introduced in Session 1). Second, we have no reliable way of estimating how many
words a person knows and how well they know about each word (i.e. measuring the quality or depth
of vocabulary knowledge).
3

For example, referring to some basic concepts introduced in Session 1, should we consider book and
books as knowing one word or two? What about the word bank in the Hang Seng Bank and in a river
bank?

One simple way that you can try yourself is to take a dictionary and count the number of words that
you know on 10 pages chosen at random. Divide the total by 10 and multiply by the number of pages
in the dictionary. Or try a more sophisticated method based on Goulden, Nation & Read (1990) (see
Schmitt, 2000, pp.7-8).

Nations Vocabulary Levels Test (VLT) is a measure of the number of words that are known at
various levels of frequency. His VocabProfiler measures the number of words at each frequency
level that are used in a sample text. The researchers call this the lexical richness of the text.

How many words would an average student in your classes know?


This will of course depend on a number of factors like your students age, the number of years they
have been studying English and their level of language proficiency. Littlewood & Liu (1996) used a
test with a sample of 40 first-year HKU/CUHK students and found that on average the students knew
around 3,500 words.

Barber (1999, as cited in Fan, 2001) looked at students vocabulary knowledge and their HKCEE
results and found a positive correlation between them.

Cobb and Horst (2000) investigated the number of high frequency words possessed by tertiary
students and found that students knew all the 2000 most basic word families and most of the words
on the 3,000 level. However, the students did not do very well on the 5,000 word level and the UWL
level. [Available http://www.er.uqam.ca/nobel/r21270/cv/CitySize.html. This is one of the postsession readings.]
4

Fan (2001) investigated the vocabulary knowledge of tertiary students and found that there was a
positive and significant relationship between language proficiency and vocabulary scores. While
vocabulary was not a major problem for students from English-medium schools, students from
Chinese-medium schools and those who score E in HKAL need help with their vocabulary (For details,
please read Fan (2001) which is one of the post-session readings.).

How much vocabulary should be learnt at different stages of proficiency?


Research on vocabulary size offers some answers to this question. Schmitt (2000, p. 142) suggests
Laufer (1988) suggests that, in order to understand a text, 95% of the words must be known, i.e.
encountering one unknown word in about every 20 words read. Below this level, the text becomes
increasingly incomprehensible.
Word frequency studies suggest, however, that as long as students have a knowledge of the 2,000
most frequent words + the 570 words of the AWL + the technical vocabulary that they will acquire
through study of a particular topic, they should reach the 95% threshold for known words in most
of the texts they read. A non-native speaker of English can, therefore, do well in academic study,
with a relatively small, but well-chosen vocabulary at least in theory!
Vocabulary size and text coverage in the Brown Corpus
Vocabulary size

Text coverage

1000

72%

2000

79.7%

3000

84%

4000

86.8%

5000

88.7%

6000

89.9%

15,851

97.8%

Vocabulary size and text coverage

Source: Francis and Kucera, 1982 (as cited in Nation & Waring, 1997, pre-session 2 reading)

Vocabulary Tests
There are well-researched vocabulary tests on the web for you to check the vocabulary size
or vocabulary knowledge of your students. Most of these tests are interactive, while some
also have the printable versions. Some investigate students' passive vocabulary; others look
at students' productive use of the target vocabulary items.
A useful site is listed here for your reference: http://www.lextutor.ca/tests/
which contains Nation's Vocabulary Levels Tests (VLT) that test vocabulary knowledge
at different levels.

Strategies for learning words of different frequency levels:

5,000 Word Level (general vocabulary)


- Training at guessing words in context
- Wide general reading : novels, newspapers and magazines
- Intensive reading of a variety of texts
- Advanced English Vocabulary workbooks
University Word List (specialised academic vocabulary)
- Learn the words on the University Word List (Nation 1990)
- Academic Word List (Coxhead, 2000)
- Intensive reading of university texts
10,000 Word Level (a wide, general vocabulary)
- Activities similar to the 5,000 word level
- Combined with learning prefixes and roots

Does size matter?


It is clearly a difficult matter to measure the size of the vocabulary of a language or the number of
words that an individual knows. Estimates must always be approximate, the more so as the numbers
increase. So is there any point in trying to measure vocabulary?

Here are several ways in which such measurements may be useful:


As a guide to the inclusion of vocabulary in textbooks, readers and dictionaries
As a means of assessing the difficulty of texts, or the difficulties that students are likely to
encounter when they read them.
As a means of assessing proficiency (vocabulary size appears to be a good indicator of other
dimensions of proficiency), e.g. used in entrance tests

Frequency
How many words are there in the following sentence, 7 or 8?
We need a vocabulary to talk about vocabulary.
The answer depends on whether we count vocabulary as one or two words. If we treat it as one
word, we are treating it as a type. If we treat it as two words, we are treating each occurrence as a
token. In this sentence, then, there are 7 types and 8 tokens.

Now imagine that you are looking at a corpus of texts containing 100,000 million tokens or more.
Can you see how it is possible to (a) count the exact number of tokens (also called the number of
running words), (b) count the number of types (i.e. the vocabulary size of the texts) and (c) rank
order the types according to their frequency (the number of times that each type occurs in the
texts)?

Dictionary-makers and linguists now typically use a corpus {pi. corpora) to count tokens, types and
frequency. A corpus is a searchable collection of texts stored on computer. The larger and more
representative the corpus, the more we can say that the results reflect the frequency of words in
the language itself.
A concordancer is a tool that can be used to generate all the instances of a word (or 'text string') in
a corpus (a principled collection) of texts. The output shows you how the word you have chosen is

used in different contexts, e.g. discipline-specific (business, medical, journalistic, etc.), written, or
spoken. Concordancers mainly tell us about collocation or the ways in which words combine with
other words. For this reason they tell us a great deal about lexicogrammar (the interface between
vocabulary and grammar).

It is important to remember, however, that frequency is simply a statistical measure. A word that
occurs frequently in one text may not occur frequently in another. Also many common words
(e.g., apple, Monday, hello) are not especially frequent in text. The
frequency of a word in text is, therefore, not exactly the same as its 'importance' in the language.

What do you think are the ten most frequently used words in English?
Ten most frequent words in English
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.

Frequency lists
Attempts to produce frequency lists for English actually go back to the 1950s (Carter, 1998). These
lists were mainly used for determining the vocabulary to be used in textbooks.
Appendix 1 shows you (part of) the 2,000 most frequent words in the Brown Corpus. The
Brown corpus shows that the 2,000 most frequent word families account for about 80% of the
running words in any English written text (Nation, 2001).
West (1953) developed a list of the 2,000 most frequent words in English called the General
Service List (GSL) (see Appendix 2 for the full list of words, and Appendix 5 for a sample page of the
GSL), words that are essential for everyday conversation. West developed this list based on
previous researchers' lists in the hope of identifying a "core vocabulary" for foreign language
learners in order to make reading a less daunting task for learners. Using this list of words, he
simplified graded readers by replacing the low- frequency words with these high-frequency words,
so that the input became more comprehensible to learners. Other than frequency, West also
selected words that had a wide "range" (words used in a wide range of topics). The list also
provides the different senses and the different parts of speech of the words, and their respective
frequency levels, which are useful information for learners. (See a sample page in Appendix 5.)
Although it has been criticized for being dated and its loose classification of word families (e.g.
EFFECT: effective, effectively, efficient, efficiency, efficiently), it has been proved to be still a valid
list by some researchers. Its main merits are that the selected words take into account (1)
frequency; (2) universality (words used in all countries); (3) utility (similar to the idea of "range", i.e.
words used to talk about a wide range of topics); (4) usefulness (words used to define other words)
(Gilner, 2011).
Using a variety of sources, Nation (1990, 2001) and his colleagues have also compiled lists of the
most frequent 1,000, 2,000, 3,000, 5,000 and 10,000 word families, and a University Word List
(UWL), later revised by removing words already existing in the AWL (see Appendix 4) of the most
frequent academic words in English.
A streamlined version called the Academic Word List (AWL) (see Appendix 3) was developed by
Coxhead (2000) and it consists of 570 word families that are especially frequent in academic texts,
but not found in the 2,000 most frequent words.

10

Computer Applications
Tom Cobbs Compleat Lexical Tutor (http://www.lextutor.ca/)
Test (to get receptive and productive tests of various word levels)
List_Learn (to learn words at various levels with an online concordancer and dictionary; to get
lists of words from 1k to 20k level and AWL and UWL)
Vocab Profiler (VP Classic v. 4) to see the vocab profile of ones writing / to predict
readability of a text for learners)
% of words at 2000 word level
% of academic words
% of words from beyond the most frequent 2000
type-token ratio (or lexical richess)
Corpus-based Range checks whether a word is used more frequently in spoken or written
English in the Brown Corpus. It also checks the range of a word in any of the 15 sub-corpora of
the Brown Corpus, i.e. in which and how many of the 15 sub-corpora a word can be found. The
sub-corpora cover a wide range of domains such as press, academic, and fiction.
Text-based Range allows you to upload up to 25 texts of your own and check the range and
frequency of a word in these 25 texts.

Task
Using VocabProfiler (http://www.lextutor.ca/vp/), now check the lexical richness of your
writing, and the several other texts you have collected. Compare your results with those of a
classmate. You should look for are the percentage of word families from beyond the most
frequent 2000, and the type-token ratio.

In Littlewood & Liu's (1996) study, the Vocab Profile that in a sample of HK students' writing, 89% of
the words used were within the first 2000. Other studies have shown that for native speaker
students the average was 83% and for scholars 65% (Laufer & Nation, 1995).
11

Please note that Vocabprofiler does not measure your vocabulary size directly. But you can compare
your results with those of others or on different samples of your writing written at different times
(there should be an improvement!). But please do not take the results too seriously! Vocabprofiler
measures vocabulary use in a text, not your general knowledge of vocabulary.

Discussion:
Other than frequency, what criteria can we use in selecting words for learners to learn?

References
Coxhead, A. (2000) A New Academic Word List. TESOL Quarterly, 34:2, 213-218.
Gilner, L. (2011). A primer on the General Service List. Reading in a Foreign Language, 23,
65-83. (available on the Internet)
Goulden, R., Nation, P. and Read, J (1990) How large can a receptive vocabulary be? Applied
Linguistics, 11:4, 341-363.
Laufer, B. (1997). The lexical plight in second language reading: words you don't know,
words you think you know, and words you can't guess. In J. Coady and T. Huckin (Eds.)
Second Language Vocabulary Acquisition. Cambridge: Cambridge University Press, pp.
20-34.
Laufer, B., & Nation, P. (1995). Vocabulary size and use: Lexical richness in L2 written production,
12

Applied Linguistics, 16(3), 307-322.

Littlewood, W. and Liu, N.F. (1996) Hong Kong Students and their English (pp.5-7). Hong
Kong: Macmillan.
Nation, P. (2001). Leaning vocabulary in another language. Cambridge: Cambridge University
Press.Schmitt, N. (2000). Vocabulary in language teaching. Cambridge: Cambridge

University Press, pp.7-8


West, M. (1953). A general service list of English words. London: Longman, Green & Co.

Pre-session 2 reading
Nation, P., & Waring, R. (1997). Vocabulary size, text coverage and word lists, In N. Schmitt & M.
McCarthy (Eds.): Vocabulary: Description, acquisition and pedagogy (pp. 6-19). Cambridge:
Cambridge University Press.

Post-session 2 reading
Cobb, T. & Horst, M. (2000). Vocabulary size of some City University students.
Available http://www.er.uqam.ca/nobel/r21270/cv/CitySize.html.

Fan, M. Y. (2001). An investigation into the vocabulary needs of university students in Hong
Kong. Asian Journal of English Language Teaching, 11, 69-85.

13

Das könnte Ihnen auch gefallen