Beruflich Dokumente
Kultur Dokumente
in Applied Linguistics
MAAL6018
Vocabulary Teaching and Learning
Session 2 Vocabulary size, frequency, word lists, and computer applications
Warming up
Number of word families
1.
2.
3.
4.
5.
Stage target
Cumulative target
In this session, we will look at (i) the vocabulary of English and (ii) the English vocabulary of
individuals. There are two senses of the word vocabulary at work here: (i) the words that belong to a
language and (ii) the words that its individual speakers know. These senses are, of course, related.
For example, an individuals vocabulary size represents a proportion of the vocabulary of the
language as a whole. Similarly, the vocabulary of a language is, in principle, the same as the number
of words that its speakers collectively know.
We will not only be concerned with vocabulary size however, because some words are used far
more frequently than others. We will look at some estimates of word frequencies for English. We
will also look at some methods of estimating vocabulary size, which take account of frequency.
The second edition of the Oxford English Dictionary (1989) had 291,500 entries (in 20 volumes!).
Goulden, Nation & Read (1990) estimated that Websters Third New International Dictionary
(published in 1961) contained around 267,000 entries and 54,000 word families. The latest edition
of Websters Third New International Dictionary, unabridged, published in 2000, is believed to
contain over 472,000 entries.
Learners dictionaries typically contain far fewer words than this. For example, Oxford Advanced
Learner's Dictionary (2005) stated it had 183,500 British and American words, phrases, and
meanings, while The Longman Dictionary of Contemporary English Online (2009) stated it had over
207,000 words, phrases, and meanings.
How many words does an average native speaker of English know? And how many words
do you know?
The most reliable estimates we have say that a university-educated native speaker will know
around 20,000 word families (Nation, 2001). Many researchers believe that native speakers acquire
about 1,000 words for every year of their lives, and so at the age of 5, they have an average
vocabulary of 4,000-5,000 word families. But estimates may be unreliable for two reasons. First, it is
difficult to say exactly what we mean by knowing a word (see Different dimensions of vocabulary
knowledge introduced in Session 1). Second, we have no reliable way of estimating how many
words a person knows and how well they know about each word (i.e. measuring the quality or depth
of vocabulary knowledge).
3
For example, referring to some basic concepts introduced in Session 1, should we consider book and
books as knowing one word or two? What about the word bank in the Hang Seng Bank and in a river
bank?
One simple way that you can try yourself is to take a dictionary and count the number of words that
you know on 10 pages chosen at random. Divide the total by 10 and multiply by the number of pages
in the dictionary. Or try a more sophisticated method based on Goulden, Nation & Read (1990) (see
Schmitt, 2000, pp.7-8).
Nations Vocabulary Levels Test (VLT) is a measure of the number of words that are known at
various levels of frequency. His VocabProfiler measures the number of words at each frequency
level that are used in a sample text. The researchers call this the lexical richness of the text.
Barber (1999, as cited in Fan, 2001) looked at students vocabulary knowledge and their HKCEE
results and found a positive correlation between them.
Cobb and Horst (2000) investigated the number of high frequency words possessed by tertiary
students and found that students knew all the 2000 most basic word families and most of the words
on the 3,000 level. However, the students did not do very well on the 5,000 word level and the UWL
level. [Available http://www.er.uqam.ca/nobel/r21270/cv/CitySize.html. This is one of the postsession readings.]
4
Fan (2001) investigated the vocabulary knowledge of tertiary students and found that there was a
positive and significant relationship between language proficiency and vocabulary scores. While
vocabulary was not a major problem for students from English-medium schools, students from
Chinese-medium schools and those who score E in HKAL need help with their vocabulary (For details,
please read Fan (2001) which is one of the post-session readings.).
Text coverage
1000
72%
2000
79.7%
3000
84%
4000
86.8%
5000
88.7%
6000
89.9%
15,851
97.8%
Source: Francis and Kucera, 1982 (as cited in Nation & Waring, 1997, pre-session 2 reading)
Vocabulary Tests
There are well-researched vocabulary tests on the web for you to check the vocabulary size
or vocabulary knowledge of your students. Most of these tests are interactive, while some
also have the printable versions. Some investigate students' passive vocabulary; others look
at students' productive use of the target vocabulary items.
A useful site is listed here for your reference: http://www.lextutor.ca/tests/
which contains Nation's Vocabulary Levels Tests (VLT) that test vocabulary knowledge
at different levels.
Frequency
How many words are there in the following sentence, 7 or 8?
We need a vocabulary to talk about vocabulary.
The answer depends on whether we count vocabulary as one or two words. If we treat it as one
word, we are treating it as a type. If we treat it as two words, we are treating each occurrence as a
token. In this sentence, then, there are 7 types and 8 tokens.
Now imagine that you are looking at a corpus of texts containing 100,000 million tokens or more.
Can you see how it is possible to (a) count the exact number of tokens (also called the number of
running words), (b) count the number of types (i.e. the vocabulary size of the texts) and (c) rank
order the types according to their frequency (the number of times that each type occurs in the
texts)?
Dictionary-makers and linguists now typically use a corpus {pi. corpora) to count tokens, types and
frequency. A corpus is a searchable collection of texts stored on computer. The larger and more
representative the corpus, the more we can say that the results reflect the frequency of words in
the language itself.
A concordancer is a tool that can be used to generate all the instances of a word (or 'text string') in
a corpus (a principled collection) of texts. The output shows you how the word you have chosen is
used in different contexts, e.g. discipline-specific (business, medical, journalistic, etc.), written, or
spoken. Concordancers mainly tell us about collocation or the ways in which words combine with
other words. For this reason they tell us a great deal about lexicogrammar (the interface between
vocabulary and grammar).
It is important to remember, however, that frequency is simply a statistical measure. A word that
occurs frequently in one text may not occur frequently in another. Also many common words
(e.g., apple, Monday, hello) are not especially frequent in text. The
frequency of a word in text is, therefore, not exactly the same as its 'importance' in the language.
What do you think are the ten most frequently used words in English?
Ten most frequent words in English
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Frequency lists
Attempts to produce frequency lists for English actually go back to the 1950s (Carter, 1998). These
lists were mainly used for determining the vocabulary to be used in textbooks.
Appendix 1 shows you (part of) the 2,000 most frequent words in the Brown Corpus. The
Brown corpus shows that the 2,000 most frequent word families account for about 80% of the
running words in any English written text (Nation, 2001).
West (1953) developed a list of the 2,000 most frequent words in English called the General
Service List (GSL) (see Appendix 2 for the full list of words, and Appendix 5 for a sample page of the
GSL), words that are essential for everyday conversation. West developed this list based on
previous researchers' lists in the hope of identifying a "core vocabulary" for foreign language
learners in order to make reading a less daunting task for learners. Using this list of words, he
simplified graded readers by replacing the low- frequency words with these high-frequency words,
so that the input became more comprehensible to learners. Other than frequency, West also
selected words that had a wide "range" (words used in a wide range of topics). The list also
provides the different senses and the different parts of speech of the words, and their respective
frequency levels, which are useful information for learners. (See a sample page in Appendix 5.)
Although it has been criticized for being dated and its loose classification of word families (e.g.
EFFECT: effective, effectively, efficient, efficiency, efficiently), it has been proved to be still a valid
list by some researchers. Its main merits are that the selected words take into account (1)
frequency; (2) universality (words used in all countries); (3) utility (similar to the idea of "range", i.e.
words used to talk about a wide range of topics); (4) usefulness (words used to define other words)
(Gilner, 2011).
Using a variety of sources, Nation (1990, 2001) and his colleagues have also compiled lists of the
most frequent 1,000, 2,000, 3,000, 5,000 and 10,000 word families, and a University Word List
(UWL), later revised by removing words already existing in the AWL (see Appendix 4) of the most
frequent academic words in English.
A streamlined version called the Academic Word List (AWL) (see Appendix 3) was developed by
Coxhead (2000) and it consists of 570 word families that are especially frequent in academic texts,
but not found in the 2,000 most frequent words.
10
Computer Applications
Tom Cobbs Compleat Lexical Tutor (http://www.lextutor.ca/)
Test (to get receptive and productive tests of various word levels)
List_Learn (to learn words at various levels with an online concordancer and dictionary; to get
lists of words from 1k to 20k level and AWL and UWL)
Vocab Profiler (VP Classic v. 4) to see the vocab profile of ones writing / to predict
readability of a text for learners)
% of words at 2000 word level
% of academic words
% of words from beyond the most frequent 2000
type-token ratio (or lexical richess)
Corpus-based Range checks whether a word is used more frequently in spoken or written
English in the Brown Corpus. It also checks the range of a word in any of the 15 sub-corpora of
the Brown Corpus, i.e. in which and how many of the 15 sub-corpora a word can be found. The
sub-corpora cover a wide range of domains such as press, academic, and fiction.
Text-based Range allows you to upload up to 25 texts of your own and check the range and
frequency of a word in these 25 texts.
Task
Using VocabProfiler (http://www.lextutor.ca/vp/), now check the lexical richness of your
writing, and the several other texts you have collected. Compare your results with those of a
classmate. You should look for are the percentage of word families from beyond the most
frequent 2000, and the type-token ratio.
In Littlewood & Liu's (1996) study, the Vocab Profile that in a sample of HK students' writing, 89% of
the words used were within the first 2000. Other studies have shown that for native speaker
students the average was 83% and for scholars 65% (Laufer & Nation, 1995).
11
Please note that Vocabprofiler does not measure your vocabulary size directly. But you can compare
your results with those of others or on different samples of your writing written at different times
(there should be an improvement!). But please do not take the results too seriously! Vocabprofiler
measures vocabulary use in a text, not your general knowledge of vocabulary.
Discussion:
Other than frequency, what criteria can we use in selecting words for learners to learn?
References
Coxhead, A. (2000) A New Academic Word List. TESOL Quarterly, 34:2, 213-218.
Gilner, L. (2011). A primer on the General Service List. Reading in a Foreign Language, 23,
65-83. (available on the Internet)
Goulden, R., Nation, P. and Read, J (1990) How large can a receptive vocabulary be? Applied
Linguistics, 11:4, 341-363.
Laufer, B. (1997). The lexical plight in second language reading: words you don't know,
words you think you know, and words you can't guess. In J. Coady and T. Huckin (Eds.)
Second Language Vocabulary Acquisition. Cambridge: Cambridge University Press, pp.
20-34.
Laufer, B., & Nation, P. (1995). Vocabulary size and use: Lexical richness in L2 written production,
12
Littlewood, W. and Liu, N.F. (1996) Hong Kong Students and their English (pp.5-7). Hong
Kong: Macmillan.
Nation, P. (2001). Leaning vocabulary in another language. Cambridge: Cambridge University
Press.Schmitt, N. (2000). Vocabulary in language teaching. Cambridge: Cambridge
Pre-session 2 reading
Nation, P., & Waring, R. (1997). Vocabulary size, text coverage and word lists, In N. Schmitt & M.
McCarthy (Eds.): Vocabulary: Description, acquisition and pedagogy (pp. 6-19). Cambridge:
Cambridge University Press.
Post-session 2 reading
Cobb, T. & Horst, M. (2000). Vocabulary size of some City University students.
Available http://www.er.uqam.ca/nobel/r21270/cv/CitySize.html.
Fan, M. Y. (2001). An investigation into the vocabulary needs of university students in Hong
Kong. Asian Journal of English Language Teaching, 11, 69-85.
13