Sie sind auf Seite 1von 176

THE CHALLENGE OF CONSTRUCTING A RELIABLE WORD LIST: AN EXPLORATORY CORPUS-BASED ANALYSIS OF INTRODUCTORY PSYCHOLOGY TEXTBOOKS by Donald Patrick Miller

A Dissertation Submitted in Partial Fulfillment of the Requirements for the degree of Doctor of Philosophy in Applied Linguistics Northern Arizona University August 2012

Approved: Douglas Biber, PhD, Co-Chair Randi Reppen, PhD, Co-Chair William Grabe, PhD Fredricka L. Stoller, EdD Yuly Asencin Delaney, PhD

UMI Number: 3524373

All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent on the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion.

UMI 3524373 Copyright 2012 by ProQuest LLC. All rights reserved. This edition of the work is protected against unauthorized copying under Title 17, United States Code.

ProQuest LLC. 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, MI 48106 - 1346

ABSTRACT

THE CHALLENGE OF CONSTRUCTING A RELIABLE WORD LIST: AN EXPLORATORY CORPUS-BASED ANALYSIS OF INTRODUCTORY PSYCHOLOGY TEXTBOOKS Donald Patrick Miller

Acknowledging the important role of reading in the university curriculum, and the important role that vocabulary plays in successful reading comprehension, researchers have directed a great deal of effort toward identifying ways that teachers and learners of English for Academic Purposes (EAP) can maximize vocabulary development efforts. Due to the exceptionally large stock of vocabulary in active use in English, these efforts include several important areas of inquiry. Two notable areas pose the following questions: a.) How many words do learners need in order to accomplish target language use tasks?; and b.) What words will learners most likely and most frequently encounter in the course of interacting in their target language use domain? The answer to this first question reveals the nature of the vocabulary learning challenge and can help educators set curricular goals. Answers to the second question can help identify the words that merit instructional and learning focus. The broad goal of this dissertation is to highlight the methodological challenges inherent in identifying the scope of the lexical challenge and in reliably capturing meaningful sets of words for instructional focus. This goal is accomplished through a corpus-based analysis of lexical distributions in one narrow academic domain: introductory psychology textbooks. ii

A 3.1 million-word corpus of 10 complete introductory psychology textbooks was compiled with the goal of representing lexical distributions in my target domain: required readings in introductory psychology coursework at the undergraduate level. The corpus was then analyzed to determine the extent to which it captures the lexical diversity that students might encounter in introductory psychology textbooks. Additionally, different size samples from the corpus were analyzed to determine the extent to which they could capture a reliable set of important wordswords that students will most likely and most frequently encounter while reading these textbooks. Findings from this study suggest that, while comparatively large, and, thus, presumably representative of the lexical variability in introductory psychology textbooks, this corpus does not in fact completely represent the lexical variability the target domain. There is likely lexical diversity in this domain that is not captured by this corpus. Additionally, no sample from the corpus, regardless of size, was able to represent the lexical distribution of the whole corpus, demonstrated by the samples inability to reliably capture the important words identified in the whole corpus. Implications of these findings are discussed in relation to previous corpus-based vocabulary research. Specifically, findings raise questions regarding the representativeness of previously designed academic corpora and the reliability of previously proposed word lists based on these academic corpora. Important implications for corpus-based vocabulary research as well as for academic vocabulary instruction and learning are also discussed.

iii

by Don Miller 2012

iv

ACKNOWLEDGEMENTS

I am extremely happy to be able acknowledge several people who have been instrumental throughout all stages of this dissertation project. First and foremost, I would like to thank Dr. Doug Biber and Dr. Randi Reppen, Co-Chairs of my dissertation committee, for their impossibly generous support and guidance throughout this process. They were always available for discussions, which helped me to think more broadly, deeply, and boldly, and, ultimately, to focus my often scattered thoughts. It was because of their guidance an encouragement that I was able to make the important connections needed to tie this dissertation together. I feel truly lucky to have had their support and their expertise. I also feel extremely grateful to my dissertation committee members, Dr. William Grabe, Dr. Fredricka Stoller, and Dr. Yuly Asencin Delaney, for their careful reading of my dissertation proposal and dissertation draft. Their insightful questions and suggestions guided me toward clarifying and developing key points in my study. And I greatly appreciate the numerous, relevant resources that they shared at multiple times throughout the process. I would also like to acknowledge Chris and Bethany Gray for the extraordinary amount of time that they shared during the development of the vocabulary analysis program used for this dissertation. The analyses conducted in this dissertation would have been impossible without their tremendous programming expertise and limitless patience. Finally, I would like to thank my wife, Li-Chiu. It was because of her encouragement that I even began this doctoral program, because of her extremely hard v

work that we were able to afford this program, and because of her unending patience, support, and love that I was able to finally finish this dissertation.

vi

TABLE OF CONTENTS

LIST OF TABLES ...............................................................................................................x LIST OF FIGURES .......................................................................................................... xii 1 INTRODUCTION .........................................................................................................1 1.1 1.2 1.3 1.4 2 Vocabulary and Academic Success .....................................................................1 The Quest for Word Lists ....................................................................................2 A Methodological Problem ..................................................................................5 Research Questions ..............................................................................................8

REVIEW OF THE LITERATURE .............................................................................11 2.1 2.2 2.3 2.4 The Construct Word and the Classification of Words in Academic Texts ....11 How Many Words Do Learners Need? ..............................................................29 Historical Overview of Word List Development Studies ..................................36 Operationalizing Representativeness in Corpus Design ...................................46

METHODOLOGY ......................................................................................................55 3.1 3.2 3.3 3.4 Overview of Research Design ...........................................................................55 Constructing the Corpus ....................................................................................58 The Unit of Analysis ..........................................................................................67 Lexical Analysis Procedures .............................................................................72

RESULTS AND DISCUSSION PART 1: LEXICAL DIVERSITY IN INTRODUCTORY PSYCHOLOGY TEXTBOOKS .................................................78 4.1 4.2 4.3 Lexical Diversity across Textbooks in the PSYTB Corpus ...............................78 Understanding Lexical Diversity in the Target Domain ....................................83 Chapter 4 Conclusions .......................................................................................91 vii

RESULTS AND DISCUSSION PART 2: RELIABLY CAPTURING THE IMPORTANT WORDS IN THE PSYTB CORPUS ...............................................93 5.1 5.2 5.3 5.4 Introduction to the Analysis ...............................................................................93 Capturing the Important Words in the PSYTB Corpus ..................................99 Manipulating the Inclusion Criteria .................................................................109 Chapter 5 Conclusions .....................................................................................132

GENERAL DISCUSSION OF FINDINGS...............................................................133 6.1 6.2 6.3 6.4 Summary and Discussion of Major Findings...................................................133 Implications of Findings ..................................................................................138 Acknowledgement of Major Limitations .........................................................142 Directions for Future Related Research ..........................................................144

CONCLUSION ..........................................................................................................147

REFERENCES ................................................................................................................150 APPENDIX A: Carnegie Classifications .........................................................................163 APPENDIX B: Academic Institutions from which Textbooks were Selected ...............165

viii

LIST OF TABLES

1. Cumulative Percentage Coverage Figures for Lady Chatterleys Lover by the Fourteen 1,000 Word-families from the BNC, with and without Proper Nouns ..........36 2. Responses from Required Reading Survey...................................................................61 3. Proportion of Total Lemmas in Different Book Proportions Sampled .........................64 4. Design of the Introductory Psychology Textbook Corpus ...........................................66 5. Comparison of the PSYTB with the Academic Corpus (Coxhead, 2000) .. 66 6. Partial Sample of Output from Vocabulary Analysis Program ....................................74 7. Number of Lemmas Required to Achieve Different Levels of Word Coverage (by book) .......................................................................................................................80 8. Book Range Distribution of Non-proper Noun Lemmas in the PSYTB Corpus ..........81 9. Number of Non-proper Noun Lemmas Unique to Each Textbook...............................90 10. Partial Comparison of Lists Produced by Samples of One Textbook with Lists Produced by Whole Corpus (50% chapter range requirement) .................................103 11. Comparison of Lists Produced by Samples of Two through Nine Textbooks with Lists Produced by Whole Corpus (50% chapter range requirement) ................105 12. Comparison of Words Meeting "Importance" Criteria in a Sample of 3 Textbooks with Words Meeting "Importance" Criteria in the Whole Corpus ...........................108 13. Comparison of Lists Produced by One Whole Textbook with List Produced by Whole Corpus (50% chapter range requirement; various minimum frequency requirements) ...... 112 14. Comparison of Lists Produced by Samples of Two through Nine Textbooks with Lists Produced by Whole Corpus (50% chapter range requirement) .....................114 15. Comparison of Lists Produced by Samples and by Whole Corpus when Minimum Frequency Requirements are Adjusted for Samples ..............................123

ix

16. Comparison of Lists Produced by Samples with Lists Produced by Whole Corpus when Minimum Frequency Requirements are Adjusted for Samples (25% chapter range requirement) .............................................................125 17. Comparison of Lists Produced by Samples with Lists Produced by Whole Corpus when Minimum Frequency Requirements are Adjusted for Samples (75% chapter range requirement) .............................................................129

LIST OF FIGURES

1. Classification of Words in Academic Writing ..............................................................22 2. Two Hypothetical Lexical Growth Curves Illustrating Two Possible Scenarios .........86 3. Lexical Growth Curve for PSYTB (not including proper nouns ..................................89 4. Comparing Lists of Important Words from the Whole Corpus with the Lists Identified in the Samples ............................................................................................104

xi

Chapter 1 Introduction

1.1

Vocabulary and Academic Success It has been argued convincingly that reading is among the most important skills

required for academic success (e.g., Carson et. al., 1992; Rosenfeld, Leung, & Oltman, 2001). Although the reading demands across disciplines vary to a considerable degree, surveys of undergraduate coursework consistently find that most coursesaside from, perhaps, composition coursesare based around content from assigned readings. These surveys make it clear that reading-related tasks are among those that instructors deem most important for student success. What, then, can English for academic purposes (EAP) teachers do to facilitate their learners academic reading skill development? Undoubtedly, many factors are involved in successful reading comprehension, from a readers use of metacognitive strategies to their ability to parse complex grammatical structures in a text (e.g., Cain, 2007; Grabe & Stoller, 2002; Sheorey & Mokhtari, 2001; Shiotsu & Weir, 2007). Among these many factors, there exists an extremely high correlation between reading proficiency and vocabulary skills in both L1 and L2 contexts (for an overview and synthesis of related studies, see Grabe, 2004; 2009). For example, with a specific focus on the relationship between vocabulary and successful academic reading, Qian (2002) noted a strong correlation between scores on vocabulary assessments and scores on the reading section of the Test of English as a Foreign Language (TOEFL). Thus, a focus on

vocabulary development is clearly one crucial part of this puzzle (Anderson, 2008; Grabe, 2009; Laufer, 1992; Nation, 2001). In order to succeed at the tertiary level, then, a student must have strong reading skills, which, in turn, requires a broad receptive vocabulary. How broad a receptive vocabulary? Synthesizing previous research on the vocabulary of the English language, Grabe (2009) suggests that there are well over 100,000 words in active use in English (p. 269), not counting common names or multi-word units, and likely over 1,000,000 words in English if one considers less frequent technical or scientific words. Fortunately for learners, not all of these words are encountered in schooling. According to Nagy and Anderson (1984), students in a U.S. educational context are exposed to anywhere from 61,934 to 88,533 distinct word families (i.e., base words plus inflectional and most morphological variants) by the 9th grade. Nation (2001) proposes that university-level L1 readers have a receptive vocabulary of approximately 20,000 word families (Nation, 2001). The primary reason for the difference in these two figures is that Nagy and Andersons (1984) figures represent the number of words to which readers are exposed, while Nations (2001) figure denotes words that readers recognize and understand.

1.2

The Quest for Word Lists Fortunately, learners of English as a second language, as well as those who work

with these learners, can take some degree of comfort in the fact that most tasks can be accomplished with far fewer than 20,000 word families. This has been demonstrated through a number of studies. Adolphs and Schmitt (2003), for example, concludes that approximately 3,000 word families (or 5,000 individual word forms) will allow learners

to participate in spoken discourse. Nation (2006) proposes that a vocabulary of 8,000 to 9,000 word families (plus proper nouns) is needed to read a novel or a newspaper, 7,000 word families to comprehend a childrens movie, and 6,000 to 7,000 word families to comprehend unscripted spoken interactions. With specific regard to comprehending written academic texts, Hazenburg and Hulstijn (1996) propose that the figure is 10,000 words, and Laufer & Ravenhorst-Kalovski (2010) suggest that the figure is between 6,000 and 8,000 words (see Chapter 3 for a more comprehensive and detailed discussion of studies related to vocabulary demands of different tasks). Even with this smaller target, this figure is understandably daunting for EAP teachers and learners. It is simply not feasible for an EAP program of reasonable length to expect to cover such a long list of words. In fact, it is unlikely for learners to increase their vocabulary by more than perhaps 2,000 words per year through intensive, direct instruction (Grabe, 2009), and this might even be an optimistic goal. Fortunately, we know that a much more restricted set of words does a great deal of work in Englishat least in terms of word coverage. Thus, for well over half a century, researchers have expended considerable time and energy in pursuit of lists of important vocabulary vocabulary that is frequently and widely used in Englishin order to help learners, teachers, and materials developers focus and maximize the efforts of language learning. This robust tradition of word list development research has produced some extremely influential lists. Perhaps the most widely known and commonly used of these lists is the General Service List (GSL; West, 1953) of 2,000 word families, which accounts for approximately 80% word coverage of most texts (Nation, 2001). More recent studies have sought to identify lists of important vocabulary for more narrowly

defined domains, particularly with regard to language at varying levels of specificity within academic English. Among early attempts at devising general academic vocabulary lists were those made by Campion and Elley (1971), Praninskas (1972), Lynn (1973), and Ghadessy (1979). More recent, and perhaps more widely known, efforts produced the University Word List (UWL; Xue and Nation, 1984) and the Academic Word List (AWL; Coxhead, 2000). Even more recent studies have in fact begun to question the efficacy of a single, "one-size-fits-all" list of general academic vocabulary (e.g., Hyland & Tse, 2007). Noting some inconsistencies in coverage provided by the AWL across disciplines, as well as some specialized uses of vocabulary across disciplines, these studies have led researchers to make a case for more targeted, discipline-specific vocabulary lists. These researchers have proposed changes reflecting disciplinary variation, ranging from the need for modest modifications to the AWL, such as removing items from and/or adding items to the AWL (e.g., Martinez et al., 2009; Chen & Ge, 2007), to the need for completely new discipline-specific lists (e.g., for public health: Millar & Budgell, 2008; for medicine: Wang, Liang, & Ge, 2008). Designing word lists makes intuitive sense: doing so can help focus program curricula or individual efforts toward those lexical items that students will encounter most often, thus, presumably, increasing chances for successful reading comprehension. Such was the expressed goal of Coxhead's (2000) research which resulted in the AWL. Noting that among the most challenging aspects of vocabulary learning and teaching in English for academic purposes (EAP) programmes is making principled decisions about which words are worth focusing on during valuable class and independent study time" (p. 312),

Coxhead proposed that her word list "might be used to set vocabulary goals for EAP courses, construct relevant teaching materials, and help students focus on useful vocabulary items" (p. 227). Additionally, she expressed the hope that "authors will undertake to write [...course books specifically designed to teach academic vocabulary...] based on the AWL." (p. 227). Indeed, the AWL has since figured prominently in EAP syllabi and popular published teaching materials. Many course books have been entirely based on her list (e.g., Inside Reading 1 Student Book Pack: The Academic Word List in Context, Burgmeier & Zimmerman, 2007; Essential academic vocabulary: Mastering the complete academic word list, Huntley, 2005; Focus on Vocabulary: Mastering the Academic Word List, Schmitt & Schmitt, 2005) or have drawn significantly from this list to inform the vocabulary component of instruction (e.g., Reading Skills for Success: A Guide to Academic Texts, Upton, 2004).

1.3

A Methodological Problem The vocabulary research noted above has allowed for great strides in our

understanding of important vocabulary for a variety of purposes, and, as noted, findings have been applied directly to the development of curricula and instructional materials. Teachers and learners no doubt take comfort inand have benefited from the empirical basis for their increasingly focused efforts. Simply stated, a great deal of faith has been placed in these lists, particularly the GSL and the AWL. For example, in the introduction to her textbook on essential academic vocabulary, Huntley (2005) contends that, After studying and practicing the words in this textbook [i.e., the AWL],

you should be able to read an academic text with knowledge of about 95% of the words1. (pg. x). Undoubtedly, knowing the words on the AWL, coupled with the words on the GSL, will aid students in their academic reading comprehension. Numerous studies have demonstrated that these combined lists can provide nearly 86% of total word coverage of academic texts (Nation, 2001). Thus, a focus on these lists can put learners well on their way toward the word coverage that they will need to comprehend required texts. However, is the claim that proposed word lists indeed contain the essential wordsthat these lists are superior to other possible words liststruly attested? This is a critical question. Answering it requires detailed corpus-based analysis of lexical diversity and variability in target domains. Surprisingly, however, throughout the long history of word list development research, this question has not been addressed. What has yet to be assessed directly is the extent to which the corpora upon which word lists are based, and indeed the word lists generated from them, trulyand reliably represent the lexical variability in their domains of interest. For teachers and learners, understandably, the mention of corpus linguistics, lexical diversity, and lexical variability may appear entirely theoretical, far removed from the actual practice of vocabulary teaching and learning. In the end, what teachers and learners simply want is a meaningful list of words that merit focus so that they can make the most efficient and productive use of their time. However, it is impossible to

It is unclear where this figure, 95%, comes from. If understood as written, 95% of words, this statement is quite misleading. When one considers the number of topic-specific words (e.g., technical terms) that occur in academic writing, is very unlikely that 2,570 word families (i.e., the word families on the GSL and AWL combined) would account for 95% of the lexical diversity in any academic text. If Huntley is referring to percentage of total word coverage, this figure is notably higher than has been proposed in the research.

know whether instructional and learning time is being well spent without asking very fundamental questions about the source of these lists and the assumptions upon which they have been based. A core assumption in the design of word lists has been that they are based upon truly representative corpora. The corpus that West (1953) based the GSL on contained five million words and included a wide variety of texts from a variety of topics and registers, so he felt it represented general English. Coxheads (2000) Academic Corpus contained 3.5 million words from 414 texts from 28 different disciplines, so she felt it, and the resulting AWL, represented academic writing. Hyland and Tses (2007) Academic Corpus included more contemporary texts, included student writing, and, according to the researchers, more systematically represent[ed] a range of key genres in several elds (p. 239), so they claimed it even better represented academic writing. Wang, Liang, and Ges (2008) corpus of just one narrow genre, medical research articles, contained 218 complete research articles, an equal number from each of 32 different medical fields. Because of the careful, principled design of their corpus, they felt it provided an accurate representation of the vocabulary their target domain. What is extremely surprising, however, is that, despite the tremendous amount of care, time, and thought that has been put into the design and compilation of corpora and the development of word lists, little to no assessment of actual lexical representativeness or word list reliability has occurred. The present study directly addresses these issues and investigates whether additional analysis might be included in estimations of the degree to which our corporaand word lists based on themtruly represent the lexical variability and distributions possible in a given target domain. Such analysis would add validity

evidence to claims of corpus representativeness and, potentially, increase the reliability of word lists produced from these corpora.

1.4

Research Questions Ideally, the field of vocabulary research interested in word list development must

address the following fundamental question: To what degree are corpora used for the creation of word lists, and the actual word lists themselves, representative of the lexical variability in the domains which they are designed to represent? Without actually having access to previously used corpora, I am not able to directly assess their representativeness in the present study. However, through an extended case study of one restricted domain, introductory psychology textbooks, I am able to provide a window into the degree of lexical variability in this domain, and the size and composition of a corpus required to represent its lexical variability. Findings from this case study can then provide insights regarding the representativeness of previously designed corpora, and, in turn, the reliability of influential word lists. More precisely, through this case study, I demonstrate a critical step involved in designing a corpus that is truly representative of the lexical variability present in the required readings in undergraduate introductory psychology classes. Though the scope of the present study may appear narrow and limited in practical applicationhow many people are actually looking for a list of vocabulary for just one of the many classes that students will have to take?implications from the findings of this study are potentially wide reaching. These findings highlight the need for corpus-based vocabulary researchers

to consider corpus-internal criteria of lexical representativenessa need that has been overlooked so far. Because of this oversight, there is reason to have reservations about the assumptions underlying our estimations of the lexical challenge learners face as well as the word lists we have produced to help learners meet this challenge. Toward this end, the following goals and associated research questions are addressed in this dissertation: Goal #1: To provide an account of the lexical diversity in a restricted register (i.e., introductory textbooks) in one academic discipline (i.e., psychology). Motivation: Describing the lexical diversity in this domain will allow us to understand the lexical challenge that will be encountered by EAP learners in one of their target language use tasks. Research question #1: How many words are needed to read introductory psychology textbooks?

Goal #2: To identify the smallest, most efficient sample of texts that can capture a stable, reliable list of important words from one restricted register (i.e., introductory textbooks) in one academic discipline (i.e., psychology) Motivation: Understanding the type of corpus required to represent the lexical variability in this target domain will allow me to assess the representativeness of my corpus and, in turn, the reliability of any word lists based on this corpus. More importantly, findings will allow word list designers and users to make inferences regarding the lexical representativeness of corpora used to produce word lists. Inferences

can also then be made regarding the reliability of the word lists themselves. Research question #2: What is the smallest, most efficient sample of texts that can capture a stable, reliable list of important words from one restricted register (i.e., introductory textbooks) and discipline (i.e., Psychology)? By addressing these research questions, this dissertation will examine fundamental theoretical issues that have yet to be addressed in corpus-based vocabulary research. Further, this dissertation will offer methodological insights that future corpusbased vocabulary researchers can take into account in their quest to produce reliable word lists for teachers and learners in a variety of target use domains. The following chapter reviews previous research related to the classification of vocabulary and the creation of word lists for instructional and learning purposes. Chapter 3, then, outlines the present studys research design as well as the methodology employed for addressing the research questions noted above. Chapters 4 and 5 present the results of the analyses as well as a discussion of the findings. Finally, Chapters 6 and 7 provide further discussion of the findings as well as salient implications for the important endeavor of understanding lexical variability in, and designing lists of important vocabulary for, target language use domains.

10

Chapter 2 Review of the Literature

This chapter begins by reviewing the literature related to three areas of vocabulary research which motivate the current study and inform its methodological considerations. The first area of research reviewed has goals which could perhaps be considered more theoretical in nature. Writing in this area discusses various conceptualizations of the construct word and the classification of words into different types (Section 2.1). The goals of the second and third research traditions can be viewed as more applied. The goal of the second area of research reviewed is to better understand the vocabulary challenge encountered by learners in different target language use domains. Studies discussed in this section, Section 2.2, essentially ask the question: How many words are needed to accomplish different tasks in English? The third research tradition surveyed in this chapter is perhaps the most practical in nature: the construction of word lists for the purpose of focusing teaching and learning efforts (Section 2.3). Section 2.4, then, raises an issue that I argue has as of yet only been partially addressed in corpus-based studies of vocabulary: corpus representativeness. The chapter culminates with relevant conclusions drawn from this review of the literature. 2.1 The Construct Word and the Classification of Words in Academic Texts Before addressing questions regarding how many words learners need for their target use domain, before identifying lexical items that merit class time and individual learner efforts, it is important to understand what is meant by the construct word and how lexis in academic writing has been classified. The following discussion begins by

11

raising two issues that are critical considerations for understanding the construct word. First, three conceptualizations of the term word, word form, lemma, and word family, are delineated. Then, different types of multi-word units are briefly surveyed. Finally, I discuss ways in which these words have been classified by vocabulary researchers. 2.1.1 Word form vs. lemmas vs. word families This section begins with a question: How many different words are bolded in the following sentence? When I was on a run yesterday, I saw that my old friend was running on the trail ahead of me, so I ran faster to catch up. Depending on our conceptualization of the term word, the answer to this question could arguably be that the items bolded in the sentence above constitute one, two, or three words. If we consider each individual orthographic form a word, we could say there are three words (i.e., word forms): run, running, and ran. Alternatively, we could say that we have two words (i.e., lemmas), the noun RUN, and the verb RUN, with ran and running being morphological variants of the single verb lemma RUN. Additionally, if our conceptualization of the term word is word family, we might even say that the bolded items constitute three forms, or members of a single word (i.e., word family): RUN. This brief example highlights a fundamental issue that must be addressed in our understanding of vocabulary: simply, what we mean when we say word. As demonstrated above, conceptualizations of the construct word can range from treating all orthographic word forms as distinct words (e.g., book and books are two separate words), to treating all words sharing a classical root, regardless of additional morphemes,

12

as members of a word family (e.g., help, helping, helpful, unhelpful, helpfulness are all members of the word family related to the headword HELP). Thus, if we want to answer questions regarding vocabulary diversity or vocabulary challenge, we must clarify our unit of analysis. Bauer and Nation (1993) propose a seven-level taxonomy for classifying morphological affixation and, ultimately, the relationships between word forms. This taxonomy, summarized below, is quite helpful in understanding the cline in the transparency of such relationships. Level 1: Each form is a different word. This level has the most restricted, but perhaps the most clear-cut, criteria. Each orthographically distinct word is counted as a single word form. Book and books are considered two separate word forms. Level 2: Inflectional Suffixes. This level recognizes the relationship between words sharing the same base and inflectional suffixes. At this level, the relationship between book and books would be recognized. Level 3: The most frequent and regular derivational affixes. At this level, words sharing the same base and including inflections and/or one of the 10 most frequent, regular derivational affixes (e.g. able, ish, non, un) are considered members of a single lexical unit. Level 4: Frequent, orthographically regular affixes. This level widens the recognized level 3 relationship to include additional frequent, orthographically regular affixes (e.g., ful, ity, ment, in).

13

Level 5: Regular but infrequent affixes. This level allows less frequent affixes that are orthographically regular. (e.g., age, hood, anti, bi) Level 6: Frequent but irregular affixes. At this level, the acknowledged relationship is widened to include base words with affixes that, while frequent, require orthographic modifications to the base for the affixation to occur. Examples of the less transparent relationship would be permeate>permeable, which requires the deletion of ate before the able suffix is added. Level 7: Classical roots and affixes. This level is the most liberal in its recognition of relationships between word forms. At this level comes the recognition of the relationship between essentially all word forms sharing classical roots. (Bauer & Nation, 1993) Most of the influential vocabulary word lists used today have been based on the more inclusive levels of this taxonomy in their operationalization of word family. The GSL 2,000 (West, 1953), the BNC 3,000 (Nation, 2004), the UWL 737 (Xue & Nation, 1984), and the AWL 570 (Coxhead, 2000) are all lists of word families. Both Coxhead (2000) and Nation (2004) used Level 6 in identifying word families for the AWL and BNC 3000, respectively. Organizing word lists around word families does have solid psycholinguistic support. It is often suggested that, with some understanding of inflectional and derivational morphology in English, learners should be able to decode the related meaning of words with shared stems, such as lucky and unlucky, without too much additional burden (Bauer & Nation, 1993; Coxhead, 200; Nation, 2001). Findings from

14

Nagy et al. (1989) are often used to support this contention. In this study, the researchers found that the combined frequency of all members of a word family affects the efficiency (i.e., speed) of word recognition of individual members of a family. With evidence that morphological variants of a word may be accessed with such efficiency, there appears to be a psycholinguistic reality to the word family as a meaningful unit. It is also true, however, that all word family member relationships are not equally transparent. More importantly, there is bound to be a great deal of variability in what learners will find transparent, particularly with regard to morphological derivations. As Schmitt and Zimmerman (2002) note, learners find derivations challenging until they have amassed a substantial amount of reading exposure, a state of affairs that does not occur for many learners. (p. 149). Many researchers have thus proposed that the choice of which unit to use as the unit of analysis be informed by two factors: a.) the use to which the word list will ultimately be put (e.g., receptive vs. productive vocabulary development); and b.) the experience and proficiency level of the learners using the word list. Regarding the goals of word list users, it has often been suggested that, for the purposes of developing receptive vocabulary (i.e., for listening or reading comprehension), word lists should be based on a more inclusive definition of the word family (e.g., Coxhead, 2000; Nation, 2001; Nation & Webb, 2010). This argument is based on the contention that recognizing and understanding a words root will give clues to its meaning, regardless of affixation. For the purpose of developing productive skills (i.e., for speaking or writing), however, a less inclusive unit of measurement, for example, the lemma, has been considered more appropriate, as the ability to use one form

15

of a word (i.e., member of a word family) does not necessarily ensure the ability to use other forms of this word (Nation & Webb, 2010). Learner proficiency level and prior morphological training may also guide decisions on the chosen unit of analysis for a word list. Synthesizing research related to morphological acquisition, Gardner (2007) offers the following recommendations for operationalizing word for various groups of learners: 1.) Base Forms + Regular Inections: younger children; low general English prociency; low English literacy skills; no specic morphological training. 2.) Base Forms + Regular Inections + Irregular Inections + Derivational Prexes: older children and adolescents, intermediate general English prociency, intermediate English literacy skills; some morphological training. 3.) Base Forms + Regular Inections + Irregular Inections + Derivational Prexes + Derivational Sufxes (regular) + Derivational Sufxes (irregular): adults, high general English prociency; high English literacy skills; extensive morphological training or experience (Gardner, 2007, p. 258-259) From Gardners recommendations, we can infer his main point: the lower the language proficiency and the less the morphological training, the lower our expectations should be regarding a learners ability to recognize relationships among word forms. Although the most influential word lists have been based on word families, an example of a contemporary list that is not is the 5,000-word list detailed in Davies and

16

Gardners frequency dictionary of American English (2010). These 5,000 words comprise lemmas, using Francis and Kucera's (1982) definition of lemma: a set of lexical forms having the same stem and belonging to the same major word class, differing only in inection and/or spelling (p. 1). For example, the lemma BEAT (n.) has the members beat (n.) and beats (n.); the separate lemma BEAT (v.) has the members beat (v.), beats (v.), beating (v.), beaten (v.). In their introduction to this list, Davies and Gardner make a strong case for their choice of the lemma as the unit of analysis, based primarily on the collocational information we would lose from a more inclusive unit. Again using beat as an example, they note that the lemma BEAT (v.) has egg and drum as common collocates, whereas the lemma BEAT (n.) has an entirely different set of collocates (e.g., miss, hear, steady). This information might be lost without making the part of speech distinction through use of the lemma as unit of analysis. Schmitt (2010) offers a number of additional benefits of using the lemma as the unit of analysis, two of which are particularly relevant to this dissertation: 1.) The unit is relatively straightforward, which means that consumers of research studies will know what it means. 2.) It takes a lot of vocabulary to function in a language, and estimates based on word families may give the impression that less is necessary than is the case, especially as many consumers may simply interpret word family figures as words. (p. 193) Schmitts first point alludes to the challenge and murkiness of applying Bauers 7 levels (particularly levels 3-6) to decisions regarding relationships between word forms.

17

As Gardner (2007) and Schmitt (2010) note, there is some level of disagreement regarding the many distinctions that must be made. Schmitts second point is a reminder that, by using word families as the unit of analysis, we are assuming that learners have a great deal of morphological knowledge. We are assuming, for example, that a learner who comprehends the word accurate is able recognize the common root and appropriately apply the meaning of affixes to the following word family members: accurately, accuracy, and inaccuracy. This assumption perhaps overestimates knowledge, thus underestimating the lexical challenge. Though Schmitt does not say this directly, assumptions at the opposite extreme that would lead to the use of the word form as the unit of analysis, i.e., that learners have no morphological knowledge, run the risk of the opposite problem: overestimating the lexical challenge. Thus, using the lemma, it would seem, is a compromise between these two extremes. 2.1.2 Multi-word units An additional issue to be considered with regard to the unit of analysis in vocabulary research is the existence of multi-word lexical units. Sinclair (1991), for example, defines a lexical item quite broadly as one or more words that together make up a unit of meaning (p. 281). His definition proposes that our notion of lexis should not stop at individual words. Rather, he contends that we must take into account all patterning that was instigated by the presence of a central word (p. 280). Among the lexical items making up a unit of meaning might be collocations, or words that occur together with greater statistical frequency than would be predicted by the frequencies of the individual words (Sinclair, 1991).

18

Gardner (2007) delineates a variety of additional multi-word unit types in English, including "phrasal verbs (call on, chew him out), idioms (rock the boat), xed phrases (excuse me), and prefabs (the point is)" (p. 260). To help differentiate types of multiword lexical items, he proposes yet another useful cline for different levels of accounting for multi-word sequences in vocabulary studies. These levels range from a narrow accounting, which includes only those units occurring within orthographic words (i.e., only going as far as including closed compound nouns), to more comprehensive accountings which would include, for example, identifying phrasal verbs, idioms, or other collocations as single units of meaning. Gardners levels are summarized below. a. Zero Level: closed compound nouns (mailman). b. Restricted Level: closed compound nouns (mailman), and hyphenated words (acid-free). c. Moderate Level: closed compound nouns (mailman), hyphenated words (acid-free), open compound nouns (post ofce), and nonseparable phrasal verbs (call on). (p. 260) d. Expanded Level: closed compound nouns (mailman), hyphenated words (acid-free), open compound nouns (post ofce), nonseparable phrasal verbs (call on), separable phrasal verbs (chew him out), idioms (rock the boat), xed phrases (excuse me), and prefabs (the point is). e. Maximal Level: all co-occurring word patterns (contiguous or noncontiguous) that form units of meaning. (Gardner, 2007, p. 260)

19

Previous research has demonstrated the wide prevalence of such multi-word sequences in English. For example, Erman and Warren (2000) found pre-constructed multi-word phrases (including some collocations, lexical bundles, and idiomatic expressions) to account for over half of both spoken and written English. With specific reference to academic writing, Biber et. al (1999) found that one type of multi-word unit, lexical bundles (i.e., "extended collocations: bundles of [three or more] words that show a statistical tendency to co-occur," p. 989), accounts for approximately 21% of academic prose. Biber et. al (2004) and Conklin and Schmitt (2008) stress the importance of these bundles due to their frequency and their utility. Further, not only are such units "prolific" in terms of frequency and coverage, but there is converging evidence of their important role in language processing. For example, both Underwood, Schmitt, and Galpin (2004) and Conklin and Schmitt (2008) found that idiomatic formulaic sequences (e.g., "a breath of fresh air") are processed more quickly than strings of words without such idiomatic association, thus concluding that such sequences are stored as single units of meaning or pragmatic function. With regard to collocations, research by Hoey (2005) and Durrant and Doherty (2010) similarly conclude that collocations have a psycholinguistic, rather than just a statistical, reality. As evidence of the increasing recognition of the important role of multi-word lexical items, lists of this nature have been proposed for individual academic disciplines (e.g., engineering: Ward, 2007), for academic writing in general (e.g., Biber, Conrad, & Cortes, 2004; Simpson-Vlach & Ellis, 2010; Durrant, 2009), and even for general English (e.g., Shin & Nation, 2008; Davies & Gardner, 2010).

20

2.1.3

Classification of vocabulary in academic texts In addition to operationally defining the construct word and other lexical units

of meaning, a great deal of work has gone into understanding the distributions of vocabulary in written texts and into classifying these words according to their use and distributions. There has been some variability in terms of classifications made for the vocabulary found in academic texts, as well as of the terms used for these distinctions, but, in general, a three-way distinction can be made. Nation (2001) uses the following terms: High-frequency vocabulary Specialized vocabulary Low-frequency vocabulary For the most part, these classifications share both qualitative and, in a general sense, quantitative attributes across studies. Each classification can be seen as a level defined in large part by increasing distributional restrictions, specifically frequency (i.e., the number of occurrences in a corpus) and range (i.e., the number of texts or sections in which a word occurs), and in some cases, dispersion (i.e., the evenness of occurrence distribution across texts or sections in a corpus) (e.g., Coxhead, 2000; Nation, 2004). At the least restricted level is high-frequency vocabulary. Less frequent vocabulary with a narrower range is specialized vocabulary, and even more restricted vocabulary is low frequency vocabulary. It is not the case, however, that as distributions become more restricted, we simply end up with a subset of a less restricted set. For example, specialized vocabulary is not simply a subset of general vocabulary. Rather, there is

21

often a line drawn between classifications making them mutually exclusive2. For example, specialized vocabulary is the set of items whose distribution is too narrow to meet the criteria of high-frequency vocabulary. Conversely, high-frequency vocabulary is found across too many registers to be considered specialized. This classification is depicted in Figure 1.
Vocabulary in Academic Texts

High-frequency / General Vocabulary (e.g., GSL)

Specialized Vocabulary

Low-frequency Vocabulary

Non-technical Academic Vocabulary (e.g., AWL)

Technical Academic Vocabulary

Figure 1. Classification of words in academic writing Word-class distinctions have also been drawn by means of distributional profile comparison across corpora representing different use domains (e.g., general English vs. academic English) to determine appropriate classification (e.g., Coxhead, 2000; Yang, 1986). Expert judgment (e.g., Farrell, 1990; Oh et al., 2000) and textual clues (Bramki & Williams, 1984; Chung & Nation, 2004) have also been employed to assist in word classification.

It must be noted, however, that word classification is somewhat dynamic, affected to varying degrees by domain. For example, special uses of high frequency vocabulary may be classified as specialized vocabulary in certain domains. As well, a word that is classified as low frequency in one domain may be considered a technical word in another domain.

22

Below is a discussion of ways in which each of these classifications has been described and operationally defined. 2.1.3.1 High-frequency / general vocabulary Schmitt (2010) uses the term general vocabulary to refer to the class of words Nation (2001) terms high-frequency vocabulary. Though their operational definitions are essentially the same, I feel that the Schmitts label, general vocabulary, better captures the essence of this classification. Whereas Nation's label alludes to one feature of this classification of words (i.e., a high rate of occurrence), words in this somewhat vague (Schmitt, 2010) classification are typically seen as those with both high frequency and broad range across a number of topics and registers. Because of their prevalence, Schmitt says that they are "the most basic and essential words in a language." Examples of general vocabulary would be those words on the GSL or perhaps the first two or three of the BNC 1000-word frequency lists designed by Nation (2004). 2.1.3.2 Specialized vocabulary In order to understand the designation specialized vocabulary, a further distinction must be drawn between non-technical specialized vocabulary (2.1.3.2.1 below) and technical specialized vocabulary (2.1.3.2.2 below). Following is a brief discussion of this distinction as it pertains to vocabulary in academic writing. 2.1.3.2.1 Non-technical academic vocabulary

Non-technical academic vocabulary is typically identified as the words that occur across a number of academic disciplines but with markedly lower range and frequency in general English. Various names have been used for this designation, including "university words" (Xue & Nation, 1984), "academic words" (Coxhead, 2000), "semi-

23

technical words" (Flowerdew, 1993) and "sub-technical words" (e.g., Baker, 1988; Cowan, 1974). Cowan (1974) defines items in this classification as context independent words which occur with high frequency across disciplines (p. 391). Baker (1988) describes them as neither highly technical and specific to a certain field of study nor obviously general in the sense of being everyday words which are not used in a distinctive way in specialized texts. (p. 91). Synthesizing previously proposed definitions of academic vocabulary, Coxhead (2000) describes these items as words which are not highly salient in a particular text (i.e., not key, topic-specific terms) and supportive but not central to the topics of the texts in which they occur (Coxhead, 2000, p. 212). Similarly, Hyland and Tse (2007) define academic vocabulary as that lexis which is reasonably frequent in a wide range of academic genres but relatively uncommon in other kinds of texts (p. 235). The unifying theme to these definitions is that this set of vocabulary lies somewhere in between general vocabulary and discipline- or topic-specific vocabulary. A clear distinction often seems to be made between words related to the actual topics of interest in particular disciplines and the words related to the rhetorical functions that take place in these disciplines. Nation and Coxhead (2001), Cobb and Horst (2001), and Granger and Paquot (2009) see a key feature of academic vocabulary being that it expresses not as much what the disciplines are about, but, rather, what they do. For example, Baker (1988) suggests that many academic words express notions general to all or several specialized disciplines, e.g., factor, method and function and which perform common rhetorical functions, such as marking an evaluation of the material

24

presented (p. 92). Vocabulary without such broad use, technical academic vocabulary, is discussed in the next section. 2.1.3.2.2 Technical academic vocabulary

As with non-technical specialized academic vocabulary, words falling into the technical vocabulary category have been referred to by a number of different labels, including technical words (Farrell, 1990), terminology (Chung, 2003), terminological words (Becka, 1972), specialised lexis (Baker, 1988), specialist language (Woodward-Kron, 2008), and technical terms (Woodward-Kron, 2008; Yang, 1986). Nation (2001) describes technical vocabulary as lexical items which are recognizably specific to a particular topic, field or discipline (p. 198). Schmitt (2010) notes that these items may range from those which are recognizably specific to a field to those with the same form as high frequency items but specialized meanings within a field (e.g., the use of memory in computer science) (p. 77). Nation (2001) notes that there may be different levels of technicalness, depending on a words relationship to a particular field. The four categories below describe this range: Category 1. Category 2. The word form appears rarely, if at all, outside this particular field The word form is used both inside and outside this particular field, but not with the same meaning Category 3. The word form is used both inside and outside this particular field,

25

but the majority of its uses with a particular meaning, though no all, are in this field. The specialized meaning it has in this field is readily accessible through its meaning outside the field Category 4. The word form is more common in this field than elsewhere. There is little or no specialization of meaning, though someone knowledgeable in the field would have a more precise idea of its meaning (Nation, 2001, p. 198-199) Such a cline poses a challenge for researchers attempting to identify technical words. In general, technical words have been identified by two methods: a.) quantitative analysis of distributional profiles; and b.) expert judgment. Analysis of distributional profiles involves comparing the profiles of individual words in a text (or texts) representing a particular topic or field with their profiles in a more general corpus (Becka, 1972; Sutarsyah, Nation, & Kennedy, 1994; Yang, 1986). For example, words that occur with a much higher frequency in a certain discipline than across a variety of disciplines might be technical vocabulary. Regarding the use of expert judgment, several studies have used a rating scale, in which disciplinary experts use their intuition to determine the degree to which lexical items are related to their particular field (e.g., Baker, 1988; Chung, 2003; Chung and Nation, 2003). For example, Chung (2003) had experts judge the "technicalness" of vocabulary in anatomy texts using the following scale (steps 3 and 4 were considered terms, i.e., technical vocabulary, while steps 1 and 2 were considered non-terms): Step 1 Words such as function words that have a meaning that has no relationship

26

with the field of anatomy, that is, words independent of the subject matter. Step 2 Words that have a meaning that is minimally related to the field of anatomy in that they describe the positions, movements, or features of the body. Step 3 Words that have a meaning that is closely related to the field of anatomy Such words are also used in general language. The words have some restrictions of usage depending on the subject field. Step 4 Words that have a meaning specific to the field of anatomy and are not likely to be known in general language. These words have clear restrictions of usage depending on the subject field. (Chung, 2003, p. 228-229) Regardless of the method used to determine classification, words in this category comprise a considerable proportion of running words in academic texts. Sutarsyah and colleagues (1994) found that 10% of an economics textbook consisted of words which could be classified as technical. Chung and Nation (2003), however, suggest that technical words account for far more of academic texts than previously proposed. They found technical words to account for 31.2% of running words in an anatomy textbook (37.6% of total word types) and 20.6% (16.3% of word types) of running words in an applied linguistics book. 2.1.3.3 Low-frequency vocabulary The low-frequency vocabulary classification of words simply denotes those items which occur with limited frequency in a corpus. Nation and Hwang (1995) further note that this limited frequency is often associated with narrow range.

27

Nation (2001) suggests that low frequency words constitute the largest single class of words, comprised of several types of words. They may be proper nouns (e.g., the name of a person or place), words that may be considered technical words in other domains (e.g. the word morphology found in a business corpus), words that narrowly failed to meet criteria of high frequency words, or words that we simply rarely encounter in our use of the language. Schmitt (2010) notes that many researchers have identified as low frequency those words which simply do not meet the frequency criteria of commonly used lists of high-frequency vocabulary (e.g., the GSL). He suggests a reconsideration of this view, however, in light of recent findings from Nation (2006), who suggests that the number of word families needed for adequate coverage of texts is much higher than previously thought. That is, to reach 98% coverage of a text, approximately 6,000-7,000 word families are needed for listening comprehension and 8,000-9,000 word families are needed for reading comprehension. As a result of Nations increased estimation, Schmitt proposes an alternative approach to conceptualizing low frequency: low frequency should be operationalized as those words which are beyond the 6,000-9,000 word families needed for adequate comprehension. Though he admits that frequency levels of word families may decrease rapidly after the most frequent 2,000, he proposes that the gap between these 2,000 word families and the 8-9,000 word families required for successful reading be termed mid-frequency. Anything beyond this, he suggests, is a luxury and clearly not essential. Though the essentialness of these words is perhaps debatable, it is clear that these words do comprise a substantial percentage of words in written texts. Nation

28

(2001) suggests that this percentage is 5%, while Chung and Nation (2003) found low frequency words to comprise 11.8% of an anatomy textbook that they analyzed in their study. Further, Alderson (2007) notes that, despite the low frequency of these words, it is important [for learners] to know, or to be able to deal with, low-frequency words in order to understand a text easily, with minimal disturbance, or for pleasure (p. 384). With word coverage of low-frequency words as high as 11.8%, Alderson's assertion would seem to have merit. Despite their noticeable presence in written texts, however, low-frequency words have not typically been a concern of researchers trying to identify candidates for instruction. Though this entire class of words may indeed account for a considerable percentage of the word types used in English (e.g., over 80% of word types in the 100,000,000-word BNC corpus occur fewer than 10 times; Leech, Rayson, & Wilson, 2001), the individual words contribute very little to coverage due to their rarity. As a result, Schmitt (2010) suggests that these words should be learned through extensive reading rather than direct instruction, and that if they are used in instruction, the focus should be on developing word-learning strategies rather than on learning the meaning of these individual words themselves.

2.2

How many words do learners need? To fully understand the nature of the vocabulary learning challenge, and as a

means for setting curricular and learning goals, another important, relevant strand in vocabulary research has sought to determine how many words are needed to accomplish

29

certain tasks that learners hope to accomplish. These studies have asked two primary questions: 1.) What is the task that learners are trying to accomplish? 2.) What percentage of word coverage allows for adequate comprehension? Once we have answered these two questions, we can begin to estimate the number of words leaners require to achieve the required level of comprehension. In answering the first question, some vocabulary researchers have classified tasks with regard to whether they are comprehension or production based (e.g., Waring & Nation, 1997), but the vast majority of studies focus on the vocabulary demands of comprehension tasks (e.g., Hirsh & Nation, 1992; Nation, 2006; Ward, 1999; Webb & Rogers, 2009a; 2009b). The primary divide is then made between the mode of input: aural (i.e., listening comprehension tasks) vs. written (i.e., reading comprehension tasks). Typically, comprehension tasks are then further delineated by genre (e.g., listening to television programs vs. movies; reading novels vs. newspapers). With regard to reading tasks, for many years it was believed that a receptive vocabulary allowing for 95% text coverage would suffice for adequate comprehension of a text. This figure was based on a 1989 study by Laufer which had a decade-long impact on our estimation of required coverage for comprehension. In this study, Laufer had participants, Israeli EAP students, underline unknown words in a text and then answer comprehension questions about this text. She then looked at the relationship between the amount of unknown vocabulary identified and reading comprehension scores. Using 55% comprehension as the measure for adequate comprehension, she found that significantly more participants scored 55% or better when they understood words

30

accounting for 95% or more of the text. Thus, she concluded that vocabulary knowledge accounting for 95% word coverage was the threshold for adequate reading comprehension. This vocabulary threshold for adequate comprehension was revised upward to 98% by Hu and Nation (2000). In their study, they operationalized adequate comprehension in a different way: reading comprehension test scores equal to those earned by test takers who had vocabulary knowledge providing 100% text coverage (a score of approximately 71% when scores on two comprehension tasks were averaged). Their findings suggested that 98% word coverage allowed most participants to achieve adequate comprehension, as measured by the test (i.e., a comprehension score of 71%). A more contemporary and better-designed investigation into the relationship between vocabulary knowledge and reading comprehension was conducted by Schmitt, Jiang, and Grabe (2011). This study compared participants self-identified knowledge of vocabulary items (i.e., through a yes/no form-meaning checklist format) with their performance on a reading comprehension test. Their findings suggest that the vocabulary coverage percentage required for adequate comprehension is closer to the figure proposed by Hu & Nation (2000): to achieve over 68% comprehension, participants needed to know vocabulary accounting for a minimum of 98% text coverage. A host of additional studies have sought to determine how many words are required to achieve adequate comprehension. Adequate comprehension has been operationalized as either a minimum score on a comprehension test (e.g., Laufer, 1992), or, more often recently, simply the vocabulary text coverages proposed as thresholds for

31

adequate comprehension (e.g., Adolphs & Schmitt, 2003; Nation, 2006; Webb and Rogers, 2009a). In 1992, Laufer compared the vocabulary size of Israeli EAP students (as determined by a vocabulary levels test) with their reading comprehension scores to determine what level of vocabulary knowledge would allow for adequate comprehension. When adequate comprehension was operationalized as a score 56% or better on a reading comprehension test, participants needed a receptive vocabulary of 3,000 word families to achieve this score. She also found that for every additional 1000-word that level participants tested at, their comprehension scores would increase by approximately 7%. Waring and Nation (1997) proposed that 95% coverage of written texts could be attained with 3,000-5,000 word families, based on text coverage of the Brown Corpus (Francis & Kucera, 1982) and of a corpus of adolescent fiction (Hirsh & Nation, 1992). Interestingly, in this study, Waring and Nation additionally propose that for productive tasks (i.e., speaking or writing), it is possible to make use of a smaller number [of words]2,000-3,000 (p. 10), though it is unclear how they arrived at this figure. Six years later, Adolphs and Schmitt (2003) investigated the amount of vocabulary required to participate in everyday spoken discourse. They analyzed text coverage in two spoken corpora: the five million-word Cambridge and Nottingham Corpus of Discourse in English (CANCODE), and the 10-million word spoken component of the BNC. Acknowledging that there may be no one lexical coverage figure which would supply an adequate amount of vocabulary in every situation, the researchers offer a tentative figure of approximately 3,000 word families (or 5,000

32

individual word forms) as a reasonable target (i.e., approximately 96% word coverage) for participation in everyday spoken discourse (p. 433). A more recent trend in estimating vocabulary size requirements for different tasks is to base estimations on the BNC sublists (Nation, 2006). In short, the BNC lists comprise a series of 1000-word lists arranged in order of frequency of occurrence3 in the British National Corpus. For example, the BNC 1K is the 1,000 most frequent word families in the BNC, the BNC 2K is the second 1,000 most frequent word families, etc. (see next section for more detail on the development of these lists). Researchers then assess the coverage of a text or corpus provided by the BNC 1K, BNC 2K, etc., until the desired coverage (e.g., 98%) is met. The vocabulary size required is then based on the BNC frequency band at which the desired coverage is met. For example, if 98% coverage of a certain kind of text is met at the BNC 5K level, researchers conclude that a vocabulary of 5,000 words is necessary to comprehend this kind of text. Using this methodology, Nation (2006) proposes that a vocabulary of 8-9,000 words is required to read a novel or a newspaper, and that vocabulary of 6-7,000 words is needed to comprehend a childrens movie (i.e., Shrek) or unscripted spoken interaction (i.e., interview and conversation from the BNC). Webb and Rogers used Nations methodology for assessing the vocabulary size required to achieve adequate comprehension of television programs (2009a) and movies (2009b). Their estimates offer a much wider range of required vocabulary size, depending on genre (e.g., crime, drama, comedy) and, to some degree, variety of English (i.e. British or American English). Achieving 95% coverage of television programs required a vocabulary of 2-

Words on these lists had to first meet range and dispersion (evenness of distribution) criteria. They were then arranged in order of frequency (see Nation, 2004 for more detail).

33

4,000 words, while 98% coverage required a vocabulary of 5-9,000 words. Achieving comparable coverage in movies required slightly higher vocabulary levels: 3-5,000 words for 95% coverage, and 6-10,000 words for 98% coverage. While the studies discussed above assess the vocabulary challenge presented by a variety of use domains, two notable studies have assessed this challenge with specific regard to academic reading comprehension. A very often-cited study by Hazenburg and Hulstijn (1996) investigated the vocabulary load of readings encountered by Dutch university students. They concluded that a minimum of 10,000 headwords (akin to word families) + proper nouns allowed for 95% word coverage of the Dutch texts, thus the contention that 10,000 words be the minimum target for students hoping for adequate comprehension of academic texts. More recently, Laufer & Ravenhorst-Kalovski (2010) revisited the issue, revising Laufers (1989) operationalization of adequate comprehension from a 55-56% reading comprehension score to a norm-referenced measure. When adequate comprehension was operationalized as the ability to read independently (as predicted by an Israeli national university entrance text) the text coverage required was found to be approximately 98%, achieved by BNC bands up through 6K-8K. Before going forward, it is important here to distinguish between Nations (2006) method, also used by Webb and Rogers (2009a; 2009b) and Laufer & RavenhorstKalovski (2010), and previous calculations of assessing vocabulary size requirements. Previous studies (e.g., Adolphs & Schmitt, 2003; Hazenburg & Hulstijn, 1996) determined how many actual words were needed to reach the desired coverage in a certain text or corpus. A required vocabulary size of 2,000 word families for just short of

34

95% coverage means that 2,000 word families will provide 95% coverage of the corpus that they analyzed. The more recent methodology employed by Nation (2006) and others asks a different question. It asks not so much how many words families are needed to achieve adequate coverage of a certain text or corpus, but how many of the BNC 1,000word lists are required to account for the desired coverage of a text. For example, Nation (2006) found that the novel Lady Chatterleys Lover was written with just over 5,000 word families. However, he did not conclude that it takes a bit over 5,000 word families to read this novel. Instead, because words up through BNC 9K were required to provide 98% coverage of this novel, he concluded that reading a novel requires a vocabulary of 89,000 word families. Table 1, from Nation (2006, p. 70) illustrates how this calculation was made. It is important to note that Nations calculation does not imply that a reader necessarilyor likelyrequires 8-9,000 words to read any single novel, but that 98% coverage of novels in general can be attained with a vocabulary of 8-9,000 words, specifically, the BNC 1K through the BNC 8K or 9K. This section has surveyed investigations which have sought to gain an understanding of the challenge that vocabulary challenge learners face in different contexts. The next section provides an overview of the many word list development studies that have sought to inform teachers and learners in their attempts to meet this challenge.

35

Table 1 Cumulative Percentage Coverage Figures for Lady Chatterleys Lover by the Fourteen 1,000 Word-families from the BNC, with and without Proper Nouns Word list (1,000) Coverage without Coverage including proper nouns (%) proper nouns (%) 1 80.88 82.93 2 88.09 90.14 3 91.23 93.28 4 93.01 95.06 5 94.08 96.13 6 94.77 96.88 7 95.38 97.43 8 95.85 97.90 9 96.17 98.22 10 96.41 98.46 11 96.62 98.67 12 96.82 98.87 13 96.93 98.98 14 96.96 99.01 Not in the lists 97.92 1000.00 Note. From How Large a Vocabulary is Needed for Reading and Listening, by I.S.P Nation, Canadian Modern Language Review, 63(1), p. 70. 2.3 Historical Overview of Word List Development Studies As noted in Chapter 1, a great deal of vocabulary research has been focused on the identification of sets of words that merit instructional focus. This section provides a brief review of some influential word list development studies, from attempts to develop lists of general English vocabulary to studies aimed at developing lists of important vocabulary for more restricted, specific purposes. 2.3.1 General English vocabulary lists Perhaps the most well known of the lists of high-frequency "general" vocabulary in English is the General Service List (GSL) of 2,000 word families published by West in (1953). However, this was not the first of such lists to be proposed. For example, over
36

three decades earlier, Thorndike (1921) constructed a list of the 10,000 most frequently occurring words from his corpus of 4.5 million words comprised of 41 different texts. Thorndike revised his list over the next two decades, proposing a list of 20,000 words in 1931, and, with a colleague, a list of 30,000 words in 1944 (Thorndike and Lorge, 1944). These early undertakings were exceptionally impressive in that large corpora were assembled, all frequency and range information was collected, and all semantic tagging was done manually. It is also important to note that these early efforts were intended for pedagogical rather than purely descriptive purposes. This goal had an effect on both the corpora used and the criteria for word selection. Regarding corpora, Thorndike (1921), for example, culled his list from a corpus that was intended to reflect the material that would be encountered in schools. The 41 texts comprising his corpus were "about 625,000 words from literature for children; about 3,000,000 words from the bible and English classics; about 300,000 words from elementary-school text books; about 50,000 words from books about cooking, sewing, farming, the trades and the like; about 90,000 words from the daily newspapers; and about 500,000 words from correspondence" (p. iii). The corpus from which West (1953) culled the GSL was approximately five million running words comprising "many sources, such as encyclopedias, magazines, textbooks, novels, essays, biographies, books about science, poetry, and the like..." (Lorge, 1949, in West, 1953, p. xi). In addition, while proposed word lists were based largely on the range and frequency of vocabulary, additional criteria were often taken into account. In constructing the GSL for example, West purposely left off words that he considered

37

highly emotional, potentially offensive, colloquial or slang, regardless of their frequency in his corpus. Learners, he felt, needed to be able to express ideas, not emotions, and he felt that these ideas could be expressed without colloquialisms. Further, to cover as wide a range of notions as possible, West left off words expressing notions already covered by more frequent words and replaced them with less frequent words with different semantic values. Despite the manipulations that West made to design a list suiting his intended purpose, this list of 2,000 word families provides surprisingly, and consistently, high coverage across different registersapproximately 76-80% (Nation, 2001). A more recent undertaking to identify words which are frequent across a range of texts and topics in a general English corpus (i.e., general English vocabulary) is Nations (2004) analysis of the 100,000,000-word British National Corpus (BNC), which resulted in the BNC 3000. The BNC was designed to be a balanced corpus of contemporary spoken and written English, sampled widely from speech (e.g., conversation, TV/radio broadcasting) and writing (e.g., novels, non-fictional expository writing) in proportions reflecting "their representation of everyday English use" (Leech, Rayson, & Wilson, 2001, p. 1). To this end, 93% of the texts were published or recorded and transcribed from between 1985 and 1994. The 4,124 text files in the corpus are either complete texts, compiled collections of short, related texts, or "substantial sample[s] of long text" (p. 1). Ninety percent of the corpus represents written English and 10 percent represents spoken English. From this corpus, Nation (2004) extracted three 1,000-word lists by using three criteria: 1. range (occurrence in 95-98 of 100 one million-word sections); 2. frequency (10,000 occurrences in whole corpus); 3. dispersion (evenness of distribution across 100 one million-word sections; a dispersion coefficient, a Juillands D, of .80). To this

38

frequency list he added days of the week, names of months, numbers, letters of the alphabet, and the words goodbye, OK, and Oh. He found considerable overlap between his list and Wests GSL, particularly in the first 1,000 words, and that the AWL words were dispersed throughout the three lists. He also found that his 3,000 word families accounted for an impressive 86.5-91.4% of running words in four different corpora, concluding that the BNC 3000 could serve as an alternative to the GSL + AWL. An even more recent undertaking by Mark Davies and Dee Gardner (2010) has produced A Frequency Dictionary of Contemporary American English, which is a list that they describe as the 5,000 most frequently used words4 in the language. This list was based on the Corpus of Contemporary American English (COCA), which, at approximately 400 million words, might be considered the American English counterpart to the BNC. The 150,000 texts comprising this corpus are evenly balanced between unscripted, spoken English, fiction, popular magazines, newspapers and academic journals. Similar to the BNC lists, the 5,000 words on Davies and Gardners list were selected based on their frequency and dispersion in the COCA. A type of salience score was calculated for each word by multiplying each words total frequency in the corpus by its dispersion score (Juillands D), the latter of which could range from 0.0 to 1.0, with 1.0 meaning perfectly even distribution across 100 four million word sections in their corpus. Words were then ordered by this score, and the top 5,000 were selected for this list.

Unlike previous studies, Davies and Gardner used the lemma as their unit of analysis, so 5,000 words means 5,000 lemmas.

39

2.3.2

General academic vocabulary lists The first of the more influential lists of vocabulary for general academic purposes

was Xue and Nations (1984) University Word List (UWL). This list, designed specifically for students headed toward tertiary academic study, was constructed through combination and adaptation of four similarly purposed lists already in existence. The first list was based on a corpus designed by Champion and Elley (1971), a 301,800-word collection of texts from journal lectures and exam papers, cross referenced with textbooks and articles from the 19 most popular disciplines (in terms of enrollment) in New Zealand universities. The second list Xue and Nation consulted was the American University Word List (Praninskas, 1972), a list culled from 272,466 words from introductory university textbooks in 10 disciplines. Finally, Xue and Nation incorporated word lists created by Lynn (1973) and Ghadessy (1979), whose lists were composed of words that international students had annotated in course readings, the rationale for inclusion being that words giving these non-native English speakers difficulty were those worthy of instructional focus. All lists were combined, and redundant words were eliminated. A final criterion for inclusion in Xue and Nations UWL was that the words not also be included in Wests (1953) General Service List (GSL) or Thorndike and Lorges (1944) Teacher's Word Book of 30,000 Words. The final list comprised 737 base words (i.e., word families), covering on average 8.5% of the Learned and Scientific sections of the Lancaster-Oslo-Bergen Corpus (LOB) of written British English and the Wellington corpus of written English and 9.8% of the Academic Corpus (Coxhead, 2000).

40

Coxhead (2000) noted a number of concerns with the UWL. First, as this list was an amalgam of four lists resulting from four separate studies, there was a lack of consistency in selection criteria as well as with the balance of register and disciplinary representation. The corpora upon which these lists were based, she suggested, lacked representativeness, primarily due to limited size and disciplinary distribution. The list she ultimately proposed, the Academic Word List (AWL), was based on a much larger corpus that she developed. Known as the Academic Corpus, it comprised approximately 3,500,000 words equally divided between four macro academic disciplines (art, commerce, law, and science), each of which composed of seven subdisciplines. She was also careful to choose texts in such a way that she achieved some balance in text length (i.e., equal numbers of short, medium, and longer texts) and avoided author bias (i.e., 400 separate authors were represented in her 414 texts). Among the 414 texts in this corpus, approximately 25% (114) comprised texts from the Brown Corpus (Francis & Kucera, 1982), the Wellington Corpus of Written English (Bauer, 1993), and the LancasterOslo/Bergen, (LOB) Corpus (Johansson, 1978) corpora. Aside from having to be words not included on Wests GSL, words selected for her list had to meet three additional criteria: 1. Frequency: items had to occur a minimum of 100 times in the entire corpus 2. Range: items had to occur in 15 of the 28 subdisciplines represented 3. Dispersion: items had to occur a minimum of 10 times per macro discipline Her final list included 570 word families which met these criteria, altogether accounting for, on average, approximately 10% of the total running words in her Academic Corpus, with approximately 3-4% coverage variation among macro disciplines. Thus, as Nation

41

(2001) acknowledges, the AWL achieved slightly better coverage than the UWL with fewer word families. 2.3.3 Discipline-specific vocabulary lists As noted in Chapter 1, over the past five or six years, academic vocabulary researchers have made a case for discipline-specific lists of important vocabulary (e.g., Hyland & Tse, 2007). Their efforts have resulted in lists that are slight modifications of the AWL (e.g., for medicine: Chen & Ge, 2007; for agriculture: Martinez et al., 2009; for applied linguistics: Vongpumivitch, et. al, 2009) or completely new discipline-specific lists (e.g., for medicine: Wang, Liang, & Ge, 2008). Coxhead and Hirsh (2007) have taken a different approach in designing their pilot list of specialized (i.e., science) vocabulary. They operationalize disciplinary vocabulary as that vocabulary which a.) meets certain range, frequency, and dispersion criteria in a disciplinary corpus, and b.) does not exist on the GSL or AWL. This operationalization essentially reflects the view that disciplinary vocabulary is an addition to general English vocabulary (i.e., GSL) and general academic vocabulary (i.e., AWL). As noted above, specialized vocabulary is typically understood as that vocabulary which does not occur on lists of general use vocabulary such as Wests (1953) General Service List or Thorndike and Lorges Teachers Wordbook of 30,000 Words (1944). Disciplinary vocabulary, then, has been operationalized as vocabulary meeting certain range, frequency, and dispersion criteria in a certain discipline (e.g., Martinez et al., 2009; Chen & Ge, 2007), sometimes with the added criteria that the word not also meet these criteria across multiple academic disciplines (e.g., Coxhead & Hirsh, 2007).

42

However, there is yet another perspective for identifying important words in academic writing. Ward (1999; 2009) and Hancioglu, Neufeld, and Eldridge (2008) suggest that it is perhaps not necessary to differentiate between general vocabulary and academic vocabulary for students who have specialized goals at the outset of their English studies. Instead, rather than assuming that students need all 2,000 GSL words and all 570 AWL words, plus any additional words frequently occurring throughout a particular target domain, they suggest it may be more useful to start by asking what words a list user will need familiarity with in order to successfully interact in their target academic environment. The list produced in answering this question will certainly include some overlap with general lists such as the GSL or AWL, but not all GSL or AWL words would be represented. As a result, they note, the vocabulary learning load would be less burdensome. Learners could narrow their focus toward only those words which they will frequently encounter in their future language use domain, thus eliminating the noise of the words which are not so relevant to their studies. If one conducts a brief survey of GSL words, for example, it is not difficult to find support for this conclusion. For example, political words such as president, governor or political might not be useful for students outside of political science or history, and it is not difficult to think of disciplines in which other GSL words, such as chimney and gaiety, might not make a single appearance. In addition, the criteria "not on the GSL" rests partly on the assumption that students in all contexts already control foundational vocabulary. This assumption, however, may not be substantiated (Laufer, 1997). Further, careful analysis of words on a final list may bring to light certain general or foundational words that have specialized uses in academia (e.g., Baker, 1988). Such

43

information, they suggest, would have obvious benefits to a learner whose language comprehension or production is hindered by incomplete facility with general or foundational vocabulary. In sum, we can see the vocabulary of primary interest for instructional focus (i.e., not including low-frequency vocabulary) in an EAP contexts as being classified into anywhere from one to three tiers: A Three-tier Model (e.g., Coxhead & Hirsh, 2007) Tier 1: Words frequently occurring across a wide range of texts from a wide range of topics and registers represented in a general corpus (e.g., GSL) Tier 2: Non-GSL words frequently occurring across a wide range of texts from a wide range of topics (i.e., macro- and sub-disciplines) and registers represented in a general academic corpus (e.g., AWL) Tier 3: Non-GSL and non-AWL words frequently occurring across a wide range of texts from a wide range of topics and registers represented in a discipline-specific academic corpus A Two-Tier Model (e.g., Chen & Ge, 2007; Martinez et al., 2009; Vongpumivitch et al., 2009; Wang, Liang, & Ge, 2008). Tier 1: Words frequently occurring across a wide range of texts from a wide range of topics and registers represented in a general corpus (e.g., GSL) Tier 2: Non-GSL words frequently occurring across a wide range of texts from a wide range of topics and registers represented in a target domain (regardless of prior classification)

44

A One-Tier Model (e.g., Ward 1999; 2009) All words frequently occurring across a wide range of texts from a wide range of topics and registers represented in a target domain (regardless of prior classification, e.g., GSL or AWL) Traditionally, the most common means of classifying vocabulary for pedagogical purposes is the three tier model. From a pedagogical perspective, this model does make some sense for many contexts, particularly those in which either the context consists of learners who have not yet identified their target discipline or the context consists of leaners who have different target disciplines. This is the case in many university-based intensive English programs in the United States. With vocabulary classified this way, tiers one and two would likely be equally relevant to all learners. A third, more specialized tier (or set of tiers) could then be the focus of additional independent study for individual students. In contrast, the one and two tier model might be more appropriate in focusing vocabulary learning efforts for a set of learners with very specific, shared targets (e.g. in an English for specific academic purposes context). The increasing popularity of research methodologies employing the one and twotier models stems from decreasing confidence in the generalizability of previously proposed listsparticularly the GSL and/or AWL. A two-tier model accepts the GSL as foundational and generalizable, regardless of domain. However, it assumes that an academic word list can be better focused for a given discipline. A one-tier model accepts no general word listwhether general English (e.g., GSL) or general academic English (e.g., AWL)as equally useful across disciplines. Rather, it seeks to identify only those words that are immediately useful to a given set of focused learners.

45

A drawback of a two- or one-tier model, however, is its restricted usefulness. Wards (1999; 2007) and Mudrayas (2006) efforts, for example, are of great use to undergraduate engineering students, but would be of little use to students in other disciplines. A strength of these models, however, in addition to focusing efforts for focused users, is that they limit (two-tier model) or eliminate (one-tier model) any existing shortcomings from previously proposed lists.

2.4

Operationalizing Representativeness in Corpus Design One final, fundamental issue that must be addressed in this literature review is the

notion of representativeness in the corpora upon which word lists have been based. As Schmitt (2010) importantly and succinctly states, "A [word list] is only as good as the corpus it is based upon, and every corpus has limitations. Firstly, no corpus can truly mirror the experience of an individual person; rather it is hopefully representative of either the language across a range of contexts... or of a particular [domain] of language." (p. 67) Regardless of the scope of inquiry, whether researchers are interested in identifying important words in general, academic, or discipline-specific language, and regardless of the word selection criteria used, commonly used word lists tend to be corpus-based. That is, corpora of varying dimensions are constructed to represent the domain of interest and word lists are then culled from these corpora. In the many studies that have produced word lists, researchers describe to varying degrees their attempt to design corpora which mirror the experience of eventual list users. For example, they often note the source of the texts, the topics covered in these texts, and the size of the corpus (e.g., number of texts included, number of total running words). Once these

46

considerations are noted and applied, however, further analysis does not occur. Instead, these researchers tend to jump strait to the creation of word lists, satisfied that the corpora that they are basing their lists on are hopefully representative of their target domain. Additional assessment of the extent to which their corpora truly are representative of the domain does not occur. This omission is quite shocking, considering the important instructional and learning decisions that rely on these word lists. This section provides an overview, far from exhaustive, of commonly noted and applied corpus representativeness considerations in word list studies. More importantly, this section concludes by highlighting fundamental considerations in corpus design that are not taken into account. Corpus linguistics manuals and methodological papers (e.g., Atkins, Clear, & Ostler, 1992; Biber, 1993; Biber, Conrad, & Reppen, 1998; Bowker & Pearson, 2002; McEnery & Wilson, 1996; McEnery, Xiao, & Tono, 2006) discuss a number of important considerations for the purpose of achieving representativeness through corpus design. Following is a discussion of the considerations noted in these papers as well as how they are operationalized in the design of corpora for vocabulary studies. 2.4.1 Domain topic coverage Most texts on corpus linguistics note the importance of topic or subject coverage in corpora upon which lexical studies are conducted (e.g., Bowker & Pearson, 2002; Biber, 1993; Biber, Conrad, & Reppen, 1998). Clearly, this consideration is a key component of representativeness, as subject matter is especially important for lexicographic studies, since the frequency of many words varies with the subject matter (Biber, Conrad, & Reppen, 1998, p. 248).

47

In corpus-based investigations of academic vocabulary, domain topic coverage is perhaps the most often noted evidence for representativeness, and it is typically discussed in great detail. With academic corpora design, "topic has been operationalized at varying levels and combinations of specificity, for example, macrodiscipline (e.g., science), discipline (e.g., biology) or subfields within a discipline (e.g., hematology, hepatology, oncology). A number of methods have been used as evidence for topic coverage. For example, both Coxhead & Hirsh (2007; a pilot sciencespecific wordlist) and Durrant (2009; a collocation list for general EAP) ensured topic coverage in their academic corpora by designing their corpora based on the disciplinary makeup of their schools. Mudraya (2006) culled a list of engineering lexis from materials representing the nine courses required of the target list users. Wang, Liang, & Ge (2008) constructed their list of academic medical words from a corpus based on a survey of medical subfields represented in a database of academic medical journals. 2.4.2 Domain text type / register coverage Corpus linguistics manuals also note the importance of including the range of text categories (i.e., genres, registers, text types) that are found in target domains (e.g., Bowker & Pearson, 2002; Kennedy, 1998; Sinclair, 1991). Depending on the ultimate goal, a corpus designer may try to balance spoken and written texts or even varying types of spoken encounters or written texts. For example, for its spoken component, the designers of the BNC were careful to include both unscripted conversation (40%) and more formal, often pre-planned, task-oriented oral language such as lectures, sermons, and television or radio broadcasts (60%) (Leech et. al, 2001). Such considerations are also illustrated in the careful design of specialized corpora, including the Education

48

Testing Services TOEFL 2000 Corpus of Spoken and Written Language (T2K-SWAL; Biber et. al, 2004), which selected register categories from the range of spoken and written activities associated with academic life (p. 7), Hyland and Tses (2007) academic corpus designed to represent the range of sources students are often asked to read at university from the main academic discourse genres (p. 238-239), or the Hong Kong Financial Services Corpus (HKFSC), whose 25 text types were felt to represent a comprehensive picture of the written discourse in the financial services industry in Hong Kong. (Li & Qian, 2010). 2.4.3 Quality / relevance of texts sampled The quality and relevance of sampled texts, the degree to which included texts are actually encountered and/or commonly used in the target domain, is yet another corpusdesign consideration typically noted. Once again, various methods have been used as evidence of text quality and relevance, from disciplinary expert provision or recommendation of texts (e.g., Coxhead & Hirsh, 2007) to claims regarding the reputation of the database (Wang, Liang, & Ge, 2008) or journals (Vongpumivitch, et. al, 2009) from which texts were selected. 2.4.4 Corpus size Academic corpus designers also note the size of their corpora with respect to number of texts and number of total running words compiled, essentially reflecting the maxim that the larger the corpus the better. Bowker & Pearson (2002), however, acknowledge that there are no hard and fast rules that can be followed to determine the ideal size of a corpus. (p. 45). The lack of standard rules of thumb regarding size is likely influenced by two issues: a) the distributional characteristics of the features of

49

interest (e.g., the frequency or rarity of occurrence), and b) the scope of the domain to be represented (e.g., NY Times sports section articles or academic writing). In general, larger corpora are required to capture less frequently occurring features (e.g., many specialized vocabulary or low-frequency vocabulary) and to represent broader domains. Thus, corpus designers are offered somewhat, though understandably, vague guidance on corpus size, e.g., a corpus should be as large as possible (Sinclair, 1991, p. 18) a corpus needs to contain many millions of words (Sinclair, 1991, p. 19) it is important to have a substantial corpus if you want to make claims based on statistical frequency (Bowker & Pearson, 2002, p. 48) lexicographic work requires the use of very large corpora(Biber, Conrad, & Reppen, 1998, pg. 25) comprising many millions of words (p. 249) 2.4.5 Additional Considerations Many other representativeness considerations have been noted and applied to corpus design, including authorial diversity (i.e., the wider the diversity the better) and completeness of sampled texts. Additionally, a critical consideration often noted is the balance of many of the variables noted above, including balance of total running words or texts per topic, discipline, genre (Hyland & Tse, 2007, p. 8) or even a balance of texts of varying length (Coxhead, 2000, p. 221). 2.4.6 What Evidence for Representativeness is Missing? Representativeness refers to the extent to which a sample includes the full range of variability in a population (Biber, 1993, p. 243) All of the corpus representative issues noted in the previous section share at least one common characteristic: they are primarily external criteria. That is, while they help
50

to ensure some degree of ecological validity (i.e., textual, topical, and register representativeness), they may not ensure what our corpora are ultimately designed to achieve: representativeness of lexical variability in our target domain. McEnery, Xiao, & Tono (2006) note, and many others concur (e.g., Atkins et. al, 1993; Sinclair, 1991) that external criteria should indeed be the primary parameters for the selection of corpus data (McEnery et. al, 2006, p. 14). Basing corpus design on internal, linguistic features of texts, it is argued, would bias the corpus design and not allow researchers to discover naturally occurring linguistic feature distributions (p. 14). It is also generally acknowledged, however, that determining the representativeness of a corpus should be a recursive endeavor based on corpus-internal evidence (Biber, 1993). That is, while external criteria such as topic coverage may guide the initial design of a corpus, there should be discrete stages of extensive empirical investigation (p. 256) of a pilot corpus, and the corpus design should be revised as necessary. Unfortunately, this important step in validation of corpus representativeness, validation based on evidence that a corpus indeed represents "naturally occurring linguistic feature [i.e., vocabulary] distributions," (Biber, 1993, p. 243) does not often occur. According to Atkins, Clear, and Ostler (1992), "...a corpus selected entirely on external criteria would be liable to miss significant variation among texts since it's categories are not motivated by textual (but by contextual) factors." (p. 5). Biber (1993) concurs, noting that, while it is certainly crucial to consider situational variables which may have an effect on feature distribution (e.g., topic, register), and to use these variables to inform our corpus design, ultimately, the variability we are interested in is not simply

51

variability in these external variables (e.g., topic or register). Rather, we are interested in representing the distribution of linguistic features (e.g., lexical variability) within our "population" (i.e., language use domain). How do we know that our corpus has indeed represented this full range of [lexical] variability (p. 243) without testing this assumption? 2.5 Conclusions The tremendous amount of research discussed above has led to important advances in our understanding of vocabulary distributions across domains of different scope. It is now clear that the lexical challenge is even greater than had been previously proposed. Our more complete understanding of this challenge has been possible to a large extent because of corpus-based vocabulary research. Our growing appreciation for word coverage required for successful comprehension has also allowed us to better understand this challenge. The sheer size of this challenge highlights the need for the most impactful, most reliable word lists that we can produce in order to help maximize vocabulary teaching and learning efforts. The word lists created by pioneers such as Thorndike and Lorge (1944) and West (1953), and more recent lists created by Nation and Huang (1984) and Coxhead (2000) have done a great deal toward focusing these efforts. Further, research by Hyland and Tse (2007), and subsequent studies on disciplinary vocabulary (e.g., Martiez et. al, 2009; Wang, Liang, & Ge, 2008) have highlighted a possible need for even more focused efforts. Specifically, this research suggests a considerable amount of cross-disciplinary variation in lexical use and distribution. Simply stated, different disciplines use different vocabulary with different frequency. Further, we also know that there is notable
52

difference in lexical diversity employed by different disciplines; that is, some disciplines make use of more words than others do (Biber, 2006). This disciplinary variation brings into question the extent to which we truly understand the lexical variability in academic writing, and it should temper the strength of conclusions we have drawn based on academic corpora. With regard to corpus representativeness, the construction of academic corpora used for lexical analyses clearly demonstrates a concern for corpus design, particularly with regard to achieving range across disciplinary variation (i.e., in the case of general academic word lists) or topic variation (i.e., in the case of discipline-specific words lists), range of authors and texts types, and some sort of balance of these variables. What is missing, however, is evidence that corpora upon which lists are based represent the lexical diversity and distributions of their target domains. As a result, we lack necessary evidence that the vocabulary lists designed based on these lists are valid and reliable. Yet despite this critical gap in our evidence, influential conclusions continue to be drawn, leading to important curricular decisions. Introducing corpus-internal variables to our process of validating corpus representativeness would at minimum add support for conclusions drawn from corpusbased lexical studies. There also exists the possibility that, through this additional analysis, we might discover profound deficiencies in corpora that have been designed for academic vocabulary research, and, in turn, deepen our understanding of lexical variability in academic writing. Ideally, this deeper understanding will inform our understanding of the lexical challenge that learners face, the process of word list creation, and the uses to which we put these word lists.

53

The following chapters detail an attempt to understand and describe the lexical diversity of one restricted domain, introductory psychology textbooks, and to determine the size and composition of a corpus needed to adequately represent the lexical variability in this domain.

54

Chapter 3 Methodology

This chapter begins by providing an overview of the research design and related goals for this dissertation (Section 3.1). Next, it details the design and construction of the Introductory Psychology Textbook Corpus, hereafter referred to at the PSYTB corpus (Section 3.2). Section 3.3, then, operationalizes and provides rationale for the unit of analysis used in the study. Finally, Section 3.4 briefly outlines the procedure used for vocabulary analyses. Additional details of the vocabulary analyses are provided in Chapters 4 and 5. 3.1 Overview of Research Design The current study was designed to be exploratory in nature. No hypotheses were proposed for confirmation or refutation. The design can be broken down into two distinct stages, each related to different goals of the study:

1. The first goal is purely descriptive: to provide an account of the lexical diversity in a restricted register (i.e., introductory textbooks) in one academic discipline (i.e., psychology).

2. The second goal could be classified as quasi-experimental: to identify the smallest, most efficient sample that can capture a stable, reliable list of important words from one restricted register (i.e., introductory textbooks) in one academic discipline (i.e., psychology).

55

The primary motivation for the first goal is in line with previous studies that have sought to understand the vocabulary challenge posed by different tasks that English learners hope to accomplish in English. It is only by understanding the size of this challenge (i.e., the lexical diversity encountered) that we are able to develop curricular and learning goals. A secondary, but related, motivation for this goal is to understand the vocabulary challenge posed by this particular domain, introductory psychology textbooks, in relation to proposals by previous studies (e.g., Hazenburg & Hulstijn, 1996; Laufer & Ravenhorst-Kalovski, 2010; Nation, 2006). These studies have proposed a wide range of figures for the size of vocabulary required for adequate reading comprehensionanywhere from approximately 6,000 (Laufer & Ravenhorst-Kalovski, 2010) to 10,000 (Hazenburg & Hulstijn, 2006) word families. However, none of these studies has directly assessed the vocabulary demands of required reading in a U.S. university context. Nation (2006), for example, did not assess the vocabulary demands of academic reading at all. While Hazenburg & Hulstijns study did exactly that, their findings were that 10,000 headwords were the minimum required for academic reading in Dutchnot English. Laufer & Ravenhorst-Kalovskis (2010) study of vocabulary in academic reading was based on English language readings from a national Israeli university entrance exam. We do not know the extent to which the texts on this exam reflect the texts encountered by undergraduate students in a U.S. university context. Thus, addressing the first goal is a step toward understanding the needs of a specific set of learners: pre-matriculation undergraduate U.S. university students. More broadly, the

56

investigation related to this goal will provide insights regarding the validity of lexical challenge estimates that have been previously proposed. The second goal in this study addresses what is perhaps a much more fundamental methodological issue in corpus-based lexical studies. As noted in Chapters 1and 2, a great deal of faith has been placed in corpus-based word lists. This faith has rested on the assumption that the corpora upon which these word lists have been based truly represent the lexical distributions in the domains that they are designed to represent. However, no corpus-based lexical studies to date have assessed the reliability of these assumptions. Rather, it has been assumed that if corpora are large and include texts from the range of topics and registers existing in a target domain, word lists culled from these corpora should be reliable representations of the important words from the target domain. Despite the lack of critical assessment of these assumptions, important curricular, instructional, and learning decisions have been made, and time and money have been invested in designing and purchasing materials based on word lists. The primary motivation behind the second goal, then, is to critically assess assumptions regarding lexical representativeness of corpora upon which word lists are based. Specifically, the series of experiments designed toward meeting this goal are meant to assess the degree to which samples reliably reflect the lexical variability in and thus are able to capture the important words froma target domain. In the present study, my target domain will be introductory psychology textbooks. By investigating the corpus required to capture the important words in this restricted domain, I hope to inform those who wish to design corpus-based lists of important words for instructional and learning purposes.

57

This chapter proceeds by detailing the methods which allowed for these goals to be addressed.

3.2

Constructing the PSYTB Corpus Section 3.2 outlines the design and construction of the Introductory Psychology

Textbook Corpus (PSYTB corpus). It explains why textbooks were chosen as a focus, how and why whole textbooks were sampled, and how the textbooks were processed into electronic format. 3.2.1 Rationale for focus on textbooks There are a number of written registers that students encounter in their university studies, including lab manuals, course catalogues, program-of-study brochures, and class management texts such as syllabi (Biber, 2006). Despite this variety, Johns (1997) notes that, particularly in the first two years of undergraduate study, In many classrooms, the textbook is the chief reading source, the single window into the values and practices of a discipline. (p. 46). Carkin (2001; 2005) similarly suggests that the primary reading material that students will encounter during their first year in university study is instructional textbooks. Empirical research directly supporting Johns' (1997) and Carkin's (2001; 2005) claims of the primacy the textbook in lower-division, undergraduate coursework, however, is lacking. Toward the goal of understanding the literacy demands of university study, there does exist a set of studies which have sought to identify faculty perceptions of skills that they feel are most important to student success in their classes. In these studies, the importance of reading skills is consistently highlighted (e.g., Darrell, 1980;

58

Grant & Ginther, 1996; Johns, 1981; Rosenfeld, Leung, & Oltman, 2001). Unfortunately, however, research related to the actual reading materials assigned in these classes is much more limited. Some studies have noted types of reading texts assigned, but in a very limited and somewhat vague manner. For example, Darrell (1980) noted the percentage of instructors surveyed who had students read entire texts, periodical articles, chapters from textbooks, or selected chapters from other books. From this data, we can get an idea of how popular each text type (e.g. periodical articles vs. chapters from textbooks) was based on the number of instructors assigning them, but we do not know the extent to which these types account for the relative proportion of required reading. Further, the classifications of types used in his study (e.g., entire texts, "periodical articles") are somewhat vague, and offer limited information regarding what the texts actually were. In a separate survey conducted by Johns (1981), respondents were asked to rank reading textbooks, reading multiple choice examination questions, reading essay examination questions, or completing non-textbooks reading assignments (e.g., journals in the field) in terms of their necessity for academic success. Though we might be able to infer the relative importance of these different required readings through this ranking, unfortunately, she does not report on responses to this question. Thus, in an attempt to validate claims of the importance of textbooks in introductory coursework, I designed and delivered a short survey to instructors of introductory psychology courses. The results of this survey directly informed the corpus design for this dissertation. Details and results of the survey are detailed below. A brief, four-item survey was sent to 84 instructors of introductory psychology at 28 colleges and universities across the United States. The primary tool in selecting these

59

28 institutions was the Carnegie Classification of Institutions of Higher Education, a classification system devised by the Carnegie Foundation Commission on Higher Education for the purpose of research and policy analysis, and the the leading framework for describing institutional diversity in U.S. higher education (Carnegie, n.d., Appendix A). The most current incarnation of this framework is based on data collected between 2003 and 2004 from the National Center for Education Statistics, the National Science Foundation, and the College Board. According to the Carnegie Foundation website, The instructional program classification is based on three pieces of information: the level of undergraduate degrees awarded (associates or bachelors), the proportion of bachelors degree majors in the arts and sciences and in professional fields, and the extent to which an institution awards graduate degrees in the same fields in which it awards undergraduate degrees. (Carnegie, n.d.). For the current study, two institutions were selected from each of the 14 Undergraduate Instructional Program Classifications which include Bachelors degreegranting institutions. Appendix A provides a detailed description of each of the 14 classifications. In addition to the instructional program classification information, care was taken to select a reasonable balance of public and private institutions as well as institutions representing a variety of geographical regions in the United States. The list of selected academic institutions is provided in Appendix B. Twenty six instructors completed surveys, for a response rate of 30.95%. Aside being asked for basic demographic information, including the region of the school (e.g., northeast, southwest) and type of institution (e.g., private 4-year college, public university), instructors were asked to identify the types of reading that they require of

60

their students (e.g., course textbooks, academic journal articles) as well as what percentage of assigned reading could be classified as these different types. Table 2 summarizes the responses to the latter two questions which dealt specifically with types of texts assigned. Table 2 Responses from Required Reading Survey Which of the following types of reading do you require of your students in your introductory course? Please select all that apply.
Types of Reading 1 2 3 4 5 6 7 8 Course Textbook (i.e., a book aiming to provide a survey of core topics in your discipline) Nonfiction, "non-textbook" academic books or book chapters (e.g., ethnographies, case studies, collections of essays) Popular media (e.g., news magazine or newspaper articles) Biographies or autobiographies Works of fiction (e.g., novels) Academic journal research articles Online sources (e.g., websites). Please describe. Other types of reading texts. Please describe. Percentage of Respondents Selecting Text Type 100% 15% 12% 0% 4% 46% 12% 8%

In your introductory course, approximately what percentage of assigned reading comes from each of the following sources?
1 2 3 4 5 6 7 8 Types of Reading Course Textbook (i.e., a book aiming to provide a survey of core topics in your discipline) Nonfiction, "non-textbook" academic books or book chapters (e.g., ethnographies, collections of case studies, collections of essays)Online sources (e.g., websites) Popular media (e.g., news magazine or newspaper articles) Biographies or autobiographies Works of fiction (e.g., novels) Academic journal research articles Online sources (e.g., websites) Other types of reading texts. Min % 70 0 0 0 0 0 0 0 Max % 100 25 15 0 20 20 10 2 Mean % 91.00 1.58 1.04 0.00 0.77 4.38 1.15 0.08 SD 9.90 5.24 3.47 0.00 3.92 6.65 2.94 0.39

61

The fact that 100% of respondents reported using a traditional, introductory "survey" textbook, and that, on average, over 90% of the reading that they reported assigning students comes from these textbooks, clearly supports claims and intuitions of the course textbook being the "chief reading source" (Johns, 1997, p. 46) for introductory psychology classes. While 46% of respondents note requiring students to read academic journal articles, on average, this type of reading accounts for less than 5% of assigned reading. As a result, I feel confident that a corpus composed entirely of textbooks represents the overwhelming majority of assigned reading in introductory psychology classes. 3.2.2 Textbook Sampling Two complementary methods guided the selection of the 10 textbooks comprising the PSYTB corpus: a.) a survey of textbooks used in introductory psychology classes at the 28 tertiary academic institutions used for the reading materials survey; and b.) a survey conducted by CollegeBoard's College-Level Examination Program (CLEP) of psychology textbooks commonly used in colleges and universities (The College Board, 2010). Of the 10 books selected, five books were identified by both my survey and CLEP's survey. Three of the books were identified only by CLEP's survey, and 2 were identified only by my survey. 3.2.3 Rationale for sampling whole textbooks Design of corpora varies to a great extent with regard to sampling method. For example, three corpora from which Coxhead selected texts for part of her Academic Corpus, the Brown Corpus (Francis & Kucera, 1982), the Wellington Corpus (Bauer, 1993), and the LOB corpus (Johansson, 1978), were each made up of 500 x 2,000-word

62

samples from texts. According to McEnery and Wilson (1996), With regard to sample lengths, taking samples of sizes which are representative of that feature should mean that the samples are also representative of those features which show more variation in distribution (p. 66). A strong argument has been made for sampling full rather than partial texts for corpora used for lexical studies, as linguistic features have been shown to vary between different sections of texts. Thus, Hyland and Tses Academic Corpus (2007) and many contemporary specialized corpora are constructed of complete texts. Because of the seemingly wide but presently unknown variation in vocabulary distributions within academic texts, I conducted pilot research with two textbooks from the PSYTB corpus to determine what would be gained, and lost, if different proportions of books were selected. Following is the procedure for this brief pilot study. First, approximately 25% of the chapters was sampled from each textbook (i.e., four chapters per book). For each book, five different samples of four chapters were taken: the initial four chapters, and then four random samples of four chapters. A mean and standard deviation was calculated for the number of different lemmas and proportion of total lemmas across the five samples in each of the two textbooks. The same process was then repeated with approximately half of each book (eight chapters per book). Table 3 shows the results of these experiments.

63

Table 3 Proportion of Total Lemmas in Different Book Proportions Sampled


Proportion of book Sampled of book (4 chapters) PSY_4 PSY_5 of book (8 chapters) PSY_4 PSY_5 8110.00 (154.57) 11952.40 (475.45) 0.7031 (0.03) 0.6974 (0.01) 5441.40 (238.30) 8008.60 (461.98) 0.4710 (0.03) 0.4679 (0.02) Lemmas Mean (SD) Proportion of Total Lemmas in Book Mean (SD)

As can be seen in Table 3, on average, approximately 47% of the lemmas in each textbook appears in one quarter of the book, and 70% of lemmas appears in one half of the textbook, regardless of whether the sample was the first or of the textbook, or a random sample of chapters equaling or of a textbook. In both textbooks, then, nearly half of all lemmas are introduced in the first quarter of the book, and approximately 30%5 of the lemmas are introduced in the second half of the book. In other words, if a sampling scheme was based on quarter or half books, regardless of the combination of chapters selected to comprise these samples, we would lose approximately 50% or 30%, respectively, of the lexical diversity in a book. Thus, although a substantial proportion of lexical variation can indeed be captured by sampling parts of textbooks, a great deal of variation is not represented. It was thus decided that, in order to more accurately and completely represent the lexical diversity that introductory psychology students encounter, whole textbooks would be the required sample size to more accurately represent the lexical variability in the target domain.
5

This finding mirrors the finding by Biber (2006) in a set of experiments that he ran with a much larger and more diverse corpus of written and spoken academic language. He found that halving the total running words and texts in a corpus produced approximately 70% of the lemmas in the total corpus.

64

Without capturing the lexical diversity possible in all samples, I would have less certainty that my corpus was representing either the vocabulary challenge posed by or the lexical distributions found in introductory psychology textbooks. 3.2.4 Textbook processing All textbooks were either acquired in electronic form or scanned into electronic form. All chapters were saved as separate files, and all files were part-of-speech tagged using the Biber Tagger (1988) to facilitate lemmatizing (see section 3.3). Front matter (e.g., publication information, tables of contents, forwards, and introductions) and appendices, indexes, and bibliographies from the textbooks were not included in the text files. 3.2.5 The Introductory Psychology Textbook Corpus Table 4 outlines the design of the corpus used in the current study, the Introductory Psychology Textbook Corpus (PSYTB). Total running words refers to the total number of orthographic words in the corpus. It is not a count of different word forms, lemmas, or word families. To contextualize the PSYTB corpus in relation to other academic corpora, Table 5 compares my corpus design with Coxheads (2000) Academic Corpus design. As can be seen from this table, at 3.1 million words, the PSYTB corpus is nearly as large as the 3.5 million-word Academic Corpus. Breaking Coxheads Academic Corpus down into macrodisciplines, she had approximately 104 texts, 875,000 words, representing all 7 academic disciplines in the Arts, one of which was psychology. This means that she had approximately 15 texts comprising approximately 125,000 words representing the discipline of psychology. Not introductory psychology textbooks, but the entire

65

discipline of psychology. From this surface comparison, then, it would appear that the PSYTB corpus should be at least as representative of its target used domain as the Academic Corpus was of its target use domain.

Table 4 Design of the Introductory Psychology Textbook Corpus (PSYTB) Textbook


PSY_1 PSY_2 PSY_3 PSY_4 PSY_5 PSY_6 PSY_7 PSY_8 PSY_9 PSY_10 Average TOTAL Corpus (PSYTB)

Chapters 15 14 15 16 16 18 14 17 18 14 15.70 157

Total Running Words 324,201 310,121 291,904 268,808 403,589 302,878 227,130 351,705 341,860 282,693 310,489.90 3,104,889

Table 5 Comparison of the PSYTB with the Academic Corpus (Coxhead, 2000)
Point Of Comparison Target Domain Corpus Design PSYTB Introductory Psychology Textbooks 10 complete contemporary introductory psychology textbooks Total Words 3.1 million words Academic Corpus (Coxhead, 2000) Academic Writing encountered by university students in New Zealand 414 texts (mixture of whole texts and 2,000-word text samples) from 28 academic disciplines 3.5 million words

66

3.3

The Unit of Analysis Three important decisions had to be made regarding what to use as the unit of

analysis for this study (Nation & Webb, 2010). First, I had to decide whether I would focus on single words or include multi-word units as well (Section 3.3.1). Then, I had to decide whether my analysis would be based on individual word forms, lemmas, or word families (Section 3.3.2). Finally, I had to decide what I would do regarding proper nouns (Section 3.3.3). 3.3.1 Single-word or multi-word units? For the present study, I chose to consider only single-words as the unit of analysis. Despite the benefits of acknowledging and addressing a broader range of multiword sequence types (as discussed in Chapter 2), the unit of analysis for the current study could be identified as Gardner's (2007) "zero level" (i.e., all orthographically single-word items, including closed compound nouns). There are two primary reasons for this. The first reason is simply practicality: the more multi-word unit types accounted for, the greater the computer programming challenge, and, in turn, the greater the chance for error. Secondly, and perhaps most importantly, the information potentially lost by not

accounting for multi-word items in my corpus is likely somewhat mitigated by the fact that relatively few of these items (including many phrasal and phrasal-prepositional verbs) are in fact idiomatic expressions with meanings which "cannot be predicted from the meanings of the parts" (Biber et. al, 1999, p. 988). Indeed, Nation (2006) notes that "the number of truly opaque phrases in English is small, and they are infrequent." (p. 66). Further, and quite relevant to the purpose of the current study, he notes that, though multi-word sequences should be learned for successful production of English, because of

67

the relative rarity of idiomatic multi-word units, "for the receptive purposes of reading and listening, [such phrases] are not a major issue" (p. 66). Thus, I feel that using the single word as the unit of analysis still allows me to account for the overwhelming majority of the lexical challenge that will be faced by undergraduate, introductory psychology students. This decision is not meant to discount the value of addressing multi-word items, even for developing vocabulary for receptive purposes. However, such a focus was considered beyond the scope of this dissertation. 3.3.2 Word forms, lemmas, or word families? While some semantic relationship between word family members cannot be denied (as noted in Chapter 2), the proposed study will use the lemma, instead of the word family, as the unit of analysis. The lemma is operationalized in the current study using Francis and Kucera's (1982) definition: a set of lexical forms having the same stem and belonging to the same major word class, differing only in inection and/or spelling (p. 1). That is, each lemma, or each base word and all inflectional variants (e.g., propose, v. = propose, proposes, proposed, proposing) will be considered one lexical item. Derived variants (e.g., through affixation: propose > proposal; through conversion: increase as a verb > increase as noun) will be considered different lexical items. This decision has both benefits and drawbacks. In terms of drawbacks, word lists created might be somewhat inflated with certain lemmas whose related meaning may be quite transparent. For example increase as a verb and increase as noun are counted as two words on the list, despite their identical orthographic form and seemingly transparent

68

semantic relationship. Having these words identified as two separate words may suggest greater lexical challenge than perhaps exists. However, I feel that the benefits of the lemma as the unit of analysis outweigh the drawbacks for two primary reasons. First, using the lemma as the unit of analysis might be considered a more conservative estimation of the lexical challenge than would using a more inclusive unit such as the word family (e.g., as with the AWL). Using the lemma assumes some degree of knowledge of inflectional morphology of list users, but comparably much less morphological knowledge than does using the word family. Thus, the lemma could be considered a compromise between the word form and the word family. Secondly, the using the lemma as the unit of analysis can alleviate some potential issues with homonyms and polysemous words. As Grabe (1991) points out, "each word form may represent a number of distinct meanings,... some of which are quite different from each other in meaning (1991, p. 392). Further, as Gardner (2007) notes, without taking polysemy or homonymy into account, word lists will "overestimate the true coverage of the word forms" by "underestimat[ing] the actual user knowledge required to negotiate the word forms" with different meanings (p. 253). Thus, perhaps the lexeme, or "a group of word forms that share the same basic meaning (apart from that associated with the inections that distinguish them) and belong to the same word class" (Biber et al., 1999, p. 54), might be the ideal unit of analysis. However, the process of reliable semantic tagging demands a great deal of time and resources not available for the present study. This does present a limitation. For example, occurrences of gear (n.), meaning "an engine component", and gear (n.), meaning "specialized tools or equipment," would

69

all be identified as occurrences of the lemma gear (n.), despite the fact that their semantic difference should arguably lead to their classification as two separate lexical items. Fortunately, some comfort can be taken in the findings of Ming-Tzu and Nation's (2004) study of homography in the AWL, in which the researchers found such cases relatively rare (i.e., approximately 10% of AWL words had multiple meanings in their corpus). Additionally, even after considering homographs distinct words, only three of the 570 words would no longer meet the range and frequency criteria Coxhead (2000) had established. More importantly, however, they note that the homography was most frequently accounted for by part of speech. Thus, the use of the lemma as unit of analysis will allow distinctions to be made among some homographs, e.g., mean as a verb vs. mean as an adjective vs. mean as a noun, that might otherwise be classified as members of the same family. Thus, they argue, if the lemmarather than the word familywere used as the unit of analysis, many homographs would already be considered distinct lexical units, as they will be in the current study. 3.3.3 Treatment of proper nouns According to Biber et. al (1999), proper nouns are those nouns which are marked orthographically with an initial capital letter and note personal names (e.g. Alan, Bond), geographical names (e.g., Australia, Hobart), names of objects (e.g., Drumbeat <a boat>), or institutions (e.g., the National Australia Bank) (p. 245). Though proper nouns may account for a great deal of text coverageas high as 6.12% in newspapers (Nation, 2006), in studies attempting to understand the lexical burden involved in comprehension tasks, proper nouns are typically considered "known" (Nation, 2006; Schmitt, 2010), or at least as "easily understood" or "having minimal learning burden"

70

(Nation, 2006, p. 70). As Cobb (2010) points out, if we include these words as lexical items, the lexical challenge appears much greater. He offers the example sentence, Pierre lives in Beaurepaire. to make this point, arguing that a reader who has never encountered the person or place name in this sentence might be assessed a knowing only 50% of the words in this sentencefar below the threshold required for adequate comprehensionwhen, in fact, the reader would likely be able to infer that "Pierre" is an animate object (e.g., person) that is capable of living in something, and that "Beaurepaire" must be a place in which something can live. Thus, though the reader may infer only partial knowledge of these unknown words in this sentence, the sentence "can be processed well enough to get the reader to the next sentence" (p. 187) even without knowing that Pierre is a male name, and that Beaurepaire a city in France. Subsequent context will likely provide more opportunities to deepen a readers understanding of these words. Thus, as has been recent practice in discussions of the lexical challenge involved in text comprehension (e.g., Adolphs & Schmitt, 2003; Cobb, 2010; Nation, 2004; 2006; Schmitt, 2010), in the current study, the percentage of word coverage provided by proper nouns was assumed as not contributing significantly to the lexical challenge. So, for example, assuming that readers must understand vocabulary accounting for 98% text coverage, if the proper nouns in a text account for 4% coverage, I was interested in identifying only lemmas accounting for the remaining 94% coverage. This method, it has been argued, provides a more accurate reflection of the learning burden (or lexical knowledge required) to achieve successful reading comprehension.

71

Proper nouns were identified by means of three rules in the vocabulary analysis program. 1.) If the BNC list has identified a noun as a common noun, all occurrences of this noun in the corpus were tagged as such. 2.) If the BNC list has identified a word as a proper noun, all occurrences in the corpus were tagged as such. 3.) If the Biber Tagger (1988) identified a word as a proper noun, all occurrences in the corpus were tagged as such, provided the word was not on the BNC list of common nouns and it did not occur elsewhere in the corpus tagged as a common noun or with a word-initial lower case. For example, the title of Adelson-Goldstein and Shapiros (2008) book, Oxford Picture Dictionary, would be analyzed as one proper noun, Oxford, and two common nouns, despite the fact that they all begin with capital letters and are part of the same title. Similarly, the geographical name "Lake Michigan," would be labeled as a common noun, Lake, and a proper noun, Michigan. Because of these rules, there were be instances where, for example, surnames which share spelling with common nouns (e.g., White, Bush, Green) were be classified as common nouns, as such words will likely be on other word lists. Fortunately, as Cobb (2010) notes, this issue "arises fairly rarely," and, as there was no reliable way to address it programmatically, this error was be accepted (p. 189).

72

3.4 3.4.1

Lexical Analysis Procedures Vocabulary analysis program The vocabulary analysis program used for this study was written by professional

computer programmer, Chis Grey, with the help of a dissertation grant from Language Learning journal and a small projects grant from Arizona Teachers of English as a Second Language (AZ-TESOL). The program produced output almost identical to that produced by Heatley and Nations Range program (1994), in that it produced the frequency of every word in every text in the corpus. In the current study, frequencies are provided by textbook and by chapter, and textbook range (out of 10) and chapter range (out of 157) totals are provided as well (see Table 6 for a partial sample of this output). There are two major differences, however. First, unlike Range, the analysis program designed for this study allowed me to use the lemma as the basis of analysis. It was capable of reading the Biber Taggers (1988) part of speech tags and of grouping all occurrences of inflectional variants of a word into a single lemma. For example, all occurrences of walk (v.), walks (v.), walked (v.), and walking (v.) were combined and noted as occurrences of the lemma walk (v.), whereas walk (n.) and walks (n.) were combined and noted as occurrences of the lemma walk (n.). Further, the program provided a part of speech and a previous classification (GSL1000, GSL1001-2000 or AWL; BNC1 to BNC14) tag to each lemma, allowing for the possibility of further analysis. The current study adopted the one-tier model of vocabulary classification discussed in Chapter 2. That is, word lists were based on frequency and/or range distribution, without regard to prior classification. This decision allows me to avoid

73

adopting any issues (e.g., with corpus design, word selection criteria, and resulting relevance of lexical items) that may exist with previously proposed lists. However, noting prior classification of words in my corpus will allow for comparisons between my lists and previously proposed lists.

74

Table 6 Partial Sample of Output from Vocabulary Analysis Program


Lemma compare comparison POS v. n. Classification (GLS/AWL) GSL10012000 GSL10012000 BNC_1 631 7 76 53 47 3 7 Classification (BNC) BNC_1 Freq. PSYTB 826 Book Range 10 Chapter Range 113 Freq PSY_1 62 Freq PSY_2 81 Freq PSY_1_Chpt1 5 Freq PSY_1_Chpt2 11

75

3.4.2

The analyses Two sets of analyses were conducted in this dissertation, each set related to one of

the goals discussed in Section 3.1 of this chapter. Because of the complexity of the analyses, particularly with regard to analyses related to goal #2, analyses are detailed to a much greater extent over the next two chapters, immediately preceding the results and discussion of each set of experiments. Only brief outlines of the analyses are provided in this section. 3.4.2.1 Analysis of lexical diversity Analyses regarding lexical diversity (Chapter 4) were designed to address the question, How many words do learners need in order to be able to read introductory psychology textbooks? Initial analysis of the lexical diversity in introductory textbooks was conducted in the tradition of previously noted studies with the same basic goal (e.g., Adolphs & Schmitt, 2003; Nation, 2006). All lemmas in the corpus were identified and counted, and the total number of lemmas required to achieve 100% coverage of a text was noted. Then, all lemmas were ordered by frequency, and the number of lemmas required to provide 98% coverageor the coverage demonstrated to be a reasonable threshold for adequate comprehension (Hu & Nation, 1992; Laufer & RavenhorstKalovski, 2010; Schmitt, Jiang, & Grabe, 2011)is also noted. From this analysis, an estimate of the vocabulary challenge of introductory psychology textbooks is proposed. Next, an additional analysis was conducted to assess the validity of this proposed estimate. This analysis was designed to demonstrate the extent to which the PSYTB corpus captures the lexical diversity in the target domain: introductory psychology textbooks. Details of this analysis are provided in Chapter 4.

76

3.4.2.2 variability

The effect of sample size and composition on representativeness of lexical

Chapter 5 details the analyses of the extent to which samples are able to reliably represent the lexical variability in the target domain: introductory psychology textbooks. First, a variety of range and frequency criteria were applied to the whole corpus of 10 ten textbooks, creating several word lists. Then, the same criteria were applied to samples from this corpus, and the lists produced were compared to the lists culled from the whole corpus. Through these comparisonsthrough analysis of the stability of the word lists I am able to assess the degree to which the samples are able to capture the lexical distributions of the whole corpus. Further, I am able to discuss the reliability of word lists produced from samples of various sizes and compositions. This analysis not only informs the strength of claims I can make regarding the reliability of these important introductory psychology word lists; rather, findings can also provide broader insights into the corpus required to reliably represent the lexical variability in a target domain. Specifically, findings highlight the crucial need for reflection regarding assumptions of the lexical representativeness of corpora that have been designed for lexical analysis. In turn, findings raise the need to question the very reliability of word lists that have been based on these corpora.

77

Chapter 4 Results and Discussion Part 1: Lexical Diversity in Introductory Psychology Textbooks

In order to understand the potential lexical challenge for English learners who will be required to read an introductory psychology textbook, it is necessary to understand the lexical diversity that they might encounter in this type of book. Section 4.1 discusses the lexical diversity across textbooks in the PSYTB corpus. Section 4.2, then, extends the investigation to address a much larger question: Does the PSYTB corpus represent the lexical diversity in the domain that it is intended to represent: introductory psychology textbooks? The answer to this fundamental question has direct implications for the validity of any estimates of the lexical challenge students might encounter in introductory psychology textbooks. More importantly however, the answer to this question has direct implications regarding previously proposed estimates of the lexical challenge posed by different language use domains (e.g., spoken discourse: Adolphs & Schmitt, 2003; television program viewing: Webb & Rogers, 2009a; reading of novels: Nation, 2006; academic reading: Hazenburg & Hulstijn, 1996). Because this fundamental question has not been asked of corpora upon which these estimates have been based, we really have no evidence to support the validity of these estimates. In fact, as analyses in this chapter will demonstrate, there is strong evidence to question the accuracy of these estimates. 4.1 Lexical Diversity across Textbooks in the PSYTB Corpus Lexical diversity was calculated in the following way. First, the lemmas in each textbook were rank ordered by frequency, from the most frequently occurring lemma

78

through the least frequently occurring lemma. Next, the percentage of total word coverage was calculated for each lemma (i.e., number of occurrences of each lemma divided by the total number of running words in a textbook). Finally, the word coverage percentages were summed, beginning with the most frequently occurring lemmas, then adding the second most frequently occurring lemmas, etc., until the percentages equaled the desired coverage (i.e., 100%, 98%, 95%, and 90%, respectively). This same process was completed both with and without proper nouns. For calculations without proper nouns, proper nouns were first removed from the list and their total word coverage percentages were calculated. Then, I calculated the number of lemmas required to achieve the desired coverage minus the coverage provided by the proper nouns. For example, if proper nouns accounted for 4% word coverage of a textbook, I would calculate the total number of lemmas required to achieve my desired coverage minus 4% (e.g., 98% 4% = 94%). This same procedure was also used to determine the lexical diversity across the whole corpus. Table 7 shows the number of lemmas required to reach different levels of total word coverage (e.g., 100%, 98%...) in each of the 10 textbooks in the PSYTB corpus, as well as for the whole corpus. Figures are provided for total lemmas including proper nouns (+ prns), and for total lemmas excluding proper nouns (- prns). From this table, we can see that there is a great deal of variation in the number of lemmas that students will encounter in their introductory psychology textbook, depending on which book is chosen by an instructor (i.e., anywhere from 11,227 to 17,353 total lemmas). It is important to reflect on this wide range in lexical burden evidenced in these 10 textbooks. If one assumes that proper nouns do not add significantly to the lexical burden (i.e., because

79

they are either known or made evident through the text), and that a reasonable threshold for adequate comprehension is 98% coverage, a student would need to know anywhere from 5,627 to 8,773 lemmas for a chance at reasonable comprehension of their introductory psychology textbook, depending on which of these textbooks is required in their class. Table 7 Number of Lemmas Required to Achieve Different Levels of Word Coverage (by book)
Textbook Lemmas Required to Achieve Different Levels of Word Coverage 100% coverage + prns PSY_1 PSY_2 PSY_3 PSY_4 PSY_5 PSY_6 PSY_7 PSY_8 PSY_9 PSY_10 Average PSYTB 13707 14599 16679 11227 17353 13373 12425 15741 15971 13901 14497.6 43,603 - prns 11681 12270 13753 9591 14048 10957 10247 12962 13073 11625 12020.7 32,598 98% coverage + prns 8581 9312 11143 6950 10742 8362 8264 9565 9896 8823 9163.8 20,253 - prns 6753 7378 8733 5627 8075 6414 6465 7611 7768 7168 7199.2 9,248 95% coverage + prns 5718 6210 7713 4658 7268 5585 5639 6358 6565 5949 6166.3 15,826 - prns 3890 4276 5303 3335 4601 3637 3840 4404 4437 4294 4201.7 4,821 90% coverage + prns 3957 4289 5378 3227 5234 3944 3950 4431 4551 4098 4305.9 13,560 - prns 2129 2355 2968 1904 2567 1996 2151 2477 2423 2443 2341.3 2,555

Further illustration of the lexical diversity in introductory psychology textbooks can be seen in Table 8, which notes the book range distribution of lemmas (excluding proper nouns) in the PSYTB corpus. Of these 32,598 lemmas in the whole corpus,

80

13,484, or approximately one third, are found in only one of the 10 textbooks. Over twothirds of the lemmas in the corpus have a range of four or fewer textbooks. There would be no major surprise to find that, in an individual textbook, most words would be used infrequently and, thus, would have limited range within a textbook. This is the nature of lexical distributions and has been demonstrated time and again throughout vocabulary research (e.g., Leech et. al, 2001). What is perhaps more surprising is that there is such variation between textbooksto the extent that over two-thirds of the lemmas found in the PSYTB corpus are used by fewer than one half of the authors. Table 8 Book Range Distribution of Non-proper Noun Lemmas in the PSYTB Corpus Textbook Range 1 textbook 2 textbooks 3 textbooks 4 textbooks 5 textbooks 6 textbooks 7 textbooks 8 textbooks 9 textbooks 10 textbooks Number of Lemmas 13,484 17,630 20,330 22,177 23,573 24,772 25,838 26,945 28,256 32,598

Tables 7 and 8 highlight the tremendous lexical diversity present even in a domain as narrow as introductory psychology textbooks. This diversity, in turn, highlights the challenge for learners who want to develop the size vocabulary that they will need to maximize their chance of adequate text comprehension. On average,

81

approximately 7,200 lemmas will provide 98% coverage of an introductory psychology textbook. In most instructional contexts, though, it is not realistic to assume that we could predict which text a student might be required to read and then focus only on this set of 7,200 (on average) lemmas. These 7,200 words might ensure adequate coverage of one introductory psychology textbook that they might be assigned, but not necessarily the coverage of a different book that they might be assigned. Thus, estimates of the lexical challenge are typically calculated based on a broader view reflecting the diversity existing in the whole target use domain (e.g., Hazenburg & Hulstijn, 1996; Nation, 2006; Schmitt, 2003). This diversity is represented by an entire representative corpus rather than by individual texts within a corpus. This is the method by which Adolphs and Schmitt (2003) arrived at their figure for spoken interactions, the way Nation (2006) arrived at his figure for both newspapers and unscripted spoken English, the way Hazenburg and Hulstijn (1996) arrived at their figure for academic Dutch texts, and the way Webb and Rogers (2009a; 2009b) arrived at their figures for television and movie genres. Taking Hazenburg and Hulstijns (1996) study as an example, the corpus upon which they based their estimates was the Institute of Netherlandic Lexicology (INL) Corpus. This corpus consisted of a wide variety of contemporary Dutch fiction and non-fiction texts comprising 42 million words. The researchers found that word coverage of just under 90% of this whole corpus could be attained with 11,723 head words. Adding proper nouns, they felt, would provide the additional coverage needed for adequate comprehension (i.e., 95%, based on Laufers 1989 estimate). This does not mean that any individual text from this corpus, or even any smaller subset of texts from this corpus, requires such a large vocabulary for a desired

82

word coverage percentage. Rather, this size vocabulary is required to provide approximately 95% coverage of a 42-million word corpus. Following this methodology then, based on the PSYTB corpus, the lexical challenge would be estimated at just over 9,200 lemmas. That is, a vocabulary of 9,200 lemmas is required to provide 98% coverage of the psychology textbooks represented in the PSYTB corpus. By mastering these 9,200 words, students would be ensured 98% word coverage, regardless of which textbook they were ultimately assigned. However, it is absolutely crucial to note that this estimate rests on the assumption that the PSYTB corpus captures the lexical diversity in the domain of introductory psychology textbooks. The experiment detailed in the following section, Section 4.2, assesses the validity of this assumption.

4.2

Understanding Lexical Diversity in the Target Domain If I were to follow the typical methodology of How many words are needed

to? studies (e.g., Adolphs & Schmitt, 2003; Hazenburg & Hulstijn, 1996; Nation, 2006; Webb, 2009a), I would be able to conclude with findings presented in Section 4.1. These How many words are needed to? studies begin by demonstrating the lexical diversity in their corpus, or less-frequently, individual texts in their corpus. Estimates of the lexical challenge are then proposed based on the observed diversity required to meet a target word coverage percentage. I have completed these steps, and, thus, might conclude that a vocabulary of just over 9,200 lemmas will provide the word coverage required for adequate comprehension of introductory psychology textbooks (i.e., 98%).

83

This estimate would be based on the assumption that the PSYTB corpus is a complete representation of the lexical diversity that students might encounter. However, is it safe to confidently propose such a figure without first assessing the degree to which my corpus indeed represents the lexical diversity in this target domain? Studies noted above have done so; that is, their estimates are based on corpora whose degree of lexical diversity representativeness has not been assessed. The experiment detailed in this section demonstrates a potential problem with these estimates. Before examining the degree to which the PSYTB corpus represents the lexical diversity in the target domain, it is first necessary to consider what it would mean to completely represent lexical diversity as well as how such diversity could be assessed. Complete representation of lexical diversity within a target domain has been termed lexical saturation (Belica, 1996, p.61-74) or lexical closure (McEnery & Wilson, 1996). Essentially, the concepts of saturation and closure suggest that a corpus has completely represented the lexical diversity in a target domain. No other lexical items are used in this target domain. Unfortunately, it is impossible to conclude with certainty what the total possible lexical diversity in introductory psychology textbooks is without compiling a corpus comprised of every possible introductory psychology textbook. However, constructing a corpus of all currently available textbooks would simply not be practical for a number of reasons. Further compounding the challenge, new textbooks and new editions of existing textbooks are published every year. Fortunately, it is possible to get a sense of the extent to which a corpus has indeed represented total possible lexical diversity in a given domain, even without acquiring

84

every text in this domain. This trend can be observed by alternately growing and analyzing a corpus until it becomes evident that the addition of new texts provides a minimal number of, or, ideally, no, new vocabulary items. Lexical growth curves illustrate such trends (e.g., Baayen, 2001; Evert & Baroni, 2006). Figure 2 below illustrates two hypothetical lexical growth curves that would lead to different conclusions regarding the degree to which lexical closure/saturation (i.e., representation of lexical diversity) has been achieved for a target domain. The curve represented by a dotted line illustrates a scenario in which little or no evidence of lexical saturation exists. In this example, the number of new words added clearly lessens after the second segment is added to the corpus, but then appears to maintain a fairly steady increase of nearly 200 novel lexical items added with each subsequent segment. From this curve, we might predict that growing the corpus would add new lexical items, and could thus conclude that we have likely not captured all of the diversity in our domain. In contrast, the curve represented by the solid line appears to indicate that little lexical novelty is added with the addition of the final two segments. Such a curve suggests that representation of total lexical diversity has been achieved and that there is perhaps no benefit to further growing the corpus.

85

1200

Number of Distinct Lexical Items

1000 800 600 400 200 0 1 2 3 4 5 6 7 Corpus Size / Number of Texts Added
Clearly Trending Toward Lexical Saturation Little or No Sign of Lexical Saturation

Figure 2. Two hypothetical lexical growth curves illustrating two possible scenarios The speed with which such a state of closure can be achievedor, indeed, even the possibility of achieving closurewould seem to be highly dependent upon the scope of the domain being represented in a corpus. For example, McEnery & Wilson (1996) appeared to reach a high degree of lexical closure after just 110,000 words in their corpus of IBM users manuals. Conversely, though achieving closure was not their goal, Evert & Baroni (2006) found a steady and steep increase in number of novel word types added, and no evidence of closure, even beyond the 75,000,000 word mark in the BNC. In the former case, the IBM manual corpus, the restricted nature of the target domain restricted by both text type and, likely, topicallowed for what the researchers termed premature closure (p. 156). In contrast, the BNC contains an extremely wide variety of registers and topics, seemingly placing no restrictions on the vocabulary it can contain. This point is an important one. As mentioned above, Grabe (2009) suggests that there may be over 1,000,000 words in English if we include frequent technical or
86

scientific words. This enormous diversity suggests that, depending on the scope of our target domain, a lexical growth curve could potentially continue upwards, even if we were to be able to include every text in our target domain. In other words, we may never reach a point where we see absolutely no lexical growth by adding additional texts. The domain being analyzed in the current dissertation is arguably much narrower than that represented in the BNC. In fact, placing it on a continuum from the exceptionally broad domain of the BNC (Evert & Baroni, 2006) to the extremely narrow, restricted domain of IBM users manuals analyzed by McEnery and Wilson (1996), it would logically seem that the narrow scope of my domain, introductory psychology textbooks, leans much further toward the scope of the IBM users manuals. That is, while introductory psychology textbooks likely cover more topics than IBM users manuals do, this set is notably restricted by convention (Griggs, Jackson, & Marek, 1999), and certainly much more restricted than the BNCs target domain: contemporary English. Following are the results of the lexical diversity analysis of the PSYTB corpus. Figure 3 illustrates the lexical growth in the PSYTB corpus as new textbooks are added to the corpus in five different sequences6. It is important to note that the growth illustrated in this figure does not include growth that would be contributed by proper nouns. Sequences 1-3 in Figure 3 illustrate the growth when textbooks are added in a random sequence. Sequences 4 and 5 were specifically designed with the intention of illustrating the range of variation possible. Sequence 4 illustrates the growth with
6

The experiment was repeated in five sequences to account for the substantial variety in the potential contribution of lexical items by different texts (i.e., between 11, 558 and 17,641 total lemmas; see Table X).

87

textbooks added in order of decreasing lexical diversity (i.e., the first textbook added has the greatest lexical diversity of the 10 books, the second book added has the second greatest lexical diversity, etc.). Sequence 5 illustrates the growth with textbooks were added in order of increasing lexical diversity. The average growth of the five different sequences is also illustrated with a solid line. On average, there appears to be a fairly steady contribution of approximately 46% new lemmas to the accumulating stock of total lemmas, even with the addition of each of the final three textbooks. Thus, we might conclude that this corpus of 10 whole textbooks, comprising over three million total running words and approximately 33,000 distinct lemmas, does not represent the possible lexical diversity in introductory psychology textbooks. On average, the lexical diversity grows by 4% (or approximately 1,283 lemmas) even with the addition of the final textbook.

88

35000

30000

25000

Sequence 1

20000
Sequence 2 Sequence 3

15000

Sequence 4 Sequence 5

10000

Average

5000

0 Add 1 Add 2nd Add 3rd Add 4th Add 5th Add 6th Add 7th Add 8th Add 9th Add book book book book book book book book book 10th book

Figure 3. Lexical growth curve for PSYTB (not including proper nouns)

Table 8 above showed us that of the 32,000+ lexical lemmas in the PSYTB corpus, over one third of these lemmas only occur in one textbook. The distribution of these 13,484 lemmas across the textbooks can be seen in Table 9.
89

Table 9 Number of Non-proper Noun Lemmas Unique to Each Textbook


PSY_1 PSY_2 PSY_3 PSY_4 PSY_5 PSY_6 PSY_7 PSY_8 PSY_9 PSY_10

1,147

1,296

1,759

811

2,037

1,184

921

1,566

1,690

1,075

As we can see in Table 9, there are between 811 and 2,037 unique lemmas per textbook. Though the experiment illustrated in Figure 3 only tested lexical growth across five sequences, Table 9 indicates that, regardless of sequence, a 10th textbook would typically add more than 1,000 lemmasaccounting for 2.49% - 4.95% of total word coverageto our growing diversity. Why might there be such diversity within the narrow domain of introductory psychology textbooks? There indeed appears to be a good deal of uniformity to the main topics surveyed in the 10 introductory textbooks in this corpus. For example, all 10 textbooks have chapters on similar topics, including learning, memory, emotion, intelligence, perception, disorders, treatments for disorders, and social psychology. However, there is also a wide range of examples authors employ to explore these topics, examples likely meant to engage readers who are novices to the field of psychology. This diversity in engaging examples leads to a diversity in vocabulary employed. Consider the underlined words in the examples below from chapters on emotion. These excerpts demonstrate the tremendous variety of examples (i.e., vocabulary) that authors incorporate in the attempt to make core topics in the field of psychology relatable to and memorable for readers. Nervous about an important encounter, we feel stomach butterflies. Anxious over speaking in public, we frequent the bathroom. Smoldering over a conflict with a family member, we get a splitting headache. You can surely recall a time when
90

you were overcome with emotion. I retain a flashbulb memory for the day I (Meyers, p. 371) I bet you get mad when you trip over something on the sidewalk or bang your knee against a wall. Right? (Nairne, p. 386) Why is the couple who just won the multimillion-dollar lottery still bickering? (Nairne, p. 386) To illustrate, suppose you come home one evening and find a cockroach nesting on your toothbrush. (Nairne, p. 386)

4.3

Chapter 4 Conclusions Findings from this chapter highlight the wide lexical diversity that learners may

encounter even in a very narrow target domain. Indeed, in order to have adequate word coverage (i.e., 98%) of this set of 10 textbooks, learners need a vocabulary of at least 9,200 lemmas. However, while this analysis of 10 textbooks provides a window into the vocabulary demands in introductory psychology textbooks, it is important to reflect on the fact that my corpus does not completely represent the lexical diversity of my target domain. In short, the PSYTB corpus does not capture the lexical diversity that students might encounter in their introductory psychology textbook. Thus, I do not have evidence allowing me to confidently propose an estimate regarding the vocabulary size required to provide adequate coverage of these textbooks. More importantly, this finding raises an important question regarding previously proposed estimates of the size vocabulary that users need to accomplish different tasks: If I have no evidence to support an estimate of the lexical challenge existing in my narrow domain based on a comparatively large

91

corpus of 3.1 million words, how valid are previously proposed estimates, which are often for much broader domains? This chapter has demonstrated that there is likely far more lexical diversity than has been accounted for by estimates of the lexical challenge posed by different language use domains. The following chapter, Chapter 5, investigates whether this unaccounted for diversity might have an effect on attempts to identify important words in a language use domain. In other words, if corpora fail to represent the possible lexical diversity in domains that they are designed to represent, to what extent are word lists based in these corpora reliably representative of important words in target domains?

92

Chapter 5 Results and Discussion Part 2: Reliably Capturing the Important Words in the PSYTB Corpus

5.1

Introduction to the Analysis While we must understand total lexical diversity in a domain in order to truly

appreciate the lexical challenge that target language domain users face, we might also question the feasibility of identifying all words that could be potentially encountered in a domain (as demonstrated in Chapter 4), or even the utility of such a list for instructional purposes. Indeed, in the case of introductory psychology textbooks, even in the unlikely scenario that we could predict well in advance which particular textbook learners would be required to read, it would be impossible to cover the amount of words these learners would need in order to achieve 98% word coverage. We see that each of these 10 textbooks requires, on average, over 7,200 lemmas to provide this 98% coverage, and that students would need over 9,200 lemmas to be prepared for a variety of textbooks that they might be assigned. And as demonstrated in Chapter 4, it is likely that even 9,200 lemmas might be an underestimate. No EAP program could hope to provide meaningful attention to this amount of vocabulary in a curriculum of reasonable length. Thus, we might conclude that what we are in fact interested in is a narrower set of words with the greatest frequency and range in order to provide learners and teachers with a more reasonable goal: a shorter, more robust set of important words upon which to focus their efforts. To further narrow the set of words for instructional focus, such sets of words have often been based on predetermined distributional characteristics of vocabulary. For example, the AWL was based on a set of range, frequency, and
93

dispersion criteria for inclusion. Aside from not being previously classified as general vocabulary (i.e., on Wests 1953 General Service List), AWL words had to occur at least 10 times in each of the four main sections of the corpus, in at least 15 of the 28 disciplines represented in the corpus, and at least 100 times in the entire corpus. These criteria produced a set of 570 word families accounting for, on average, 10% of the running words in the four sections of the Academic Corpus. Several studies over the past five years or so have adopted similar predetermined range and frequency criteria to identify lists of words that seem important (i.e., that frequently occurr across a range of texts and/or topics) to more specialized domains, including agriculture (Martinez et al., 2009), applied linguistics (Vongpumivitch et al., 2009), and medicine (Chen & Ge, 2007). While the AWL and other specialized lists have certainly served an important function of helping focus learning and instruction, one might also ask whether these lists indeed represent the naturally occurring lexical distributions in the target domains. Asked another way, How certain are we that the corpora upon which lists of important words were based were sufficiently representative of these distributions? Without answering this question, confidence in lists produced from these corpora should be tempered. While we can be confident that researchers have identified what they have operationalized as important words in the corpora that they have used to represent their target domain, we have no evidence that these lists are in fact generalizable to the target domain.

94

The surprising fact is that no studies to date have investigated this question. Rather, previous research has essentially gone from compiling corpora that are hopefully (Schmitt, 2010, p. 67) sufficiently representative straight to extracting lists of words that meet certain distributional characteristics in these corpora. Indeed, in a recent book from Nation and Webb (2010) titled Researching and Analyzing Vocabulary, the following steps are offered for word list development: 1. Decide on the research question the list will be used to answer, or the reason for making the list. 2. Decide on the unit of counting you will useword type, lemma, or word family. This decision should relate closely to your reason for making the list. 3. Choose or create a suitable corpus. The makeup of the corpus should reflect the needs of the people who will benefit from the use of the list. 4. Make decisions about what will be counted as words and what will be put into separate lists. For example, will proper nouns be part of the list, or will they be separated in counting? 5. Decide on the criteria that will be used to order the words on the list. These could include range, frequency, and dispersion, or some summative measure like the standard frequency index (Carroll, Davies, and Richman 1971). 6. Cross-check the resulting list on another corpus or against another list to see if there are any notable omissions or unusual inclusions or placements. (Nation & Webb, 2010, p. 135) The steps delineated in this procedure, particularly step #3, highlight a potentially important oversight. What exactly is a suitable corpus? How exactly can list creators

95

determine whether the chosen or designed corpus in fact reflect[s] the needs of target list users? If a corpus comprises a complete collection of the readings that word list users will encounter, certainly the corpus is suitable, reflective of the needs of users. However, this is not often the case. As noted in Chapter 2, typical practice in corpus design is to select texts which represent a balance of the range of, for example, topics or registers that list users will encounter, and to then collect as many samples as is feasible. The assumption then appears to be that accounting for these considerations will lead to a representative corpus. In turn, then, the word list culled from this corpus is thought to represent the important words in the target domain. What does not typically happen is an assessment of the reliability of lists produced from these corpora. Step #6 in Nation and Webbs (2010) proposed procedure does suggest that list creators assess the ecological validity of a word list to a certain degree. That is, step #6 proposes list creators assess whether a word list operates in a similar fashion in comparison corpora, or compares favorably with other similarly purposed lists. However, the example provided by Nation and Webb (2010) suggests that the purpose of step #6 is to check whether sets of words are properly ordered according to frequency band. That is, the point of the analysis is to assess whether the set of words in the top frequency band (e.g., the most frequent 1,000 words) provides a higher percentage of word coverage than does the set of words in the second frequency band (e.g., the second 1,000 words), and so on, in a comparison corpus. If so, the authors would argue, the list has some evidence of ecological validity. If not, the list might have to be revised.

96

Nation (2006) describes this process in detail with regard to the BNC 1,000-word lists. Specifically, using a corpus comprised of nine previously complied corpora as a comparison, he investigated whether or not the first 1,000 words from the BNC list accounted for more tokens in the comparison corpus than did the BNCs second 1,000 words, the second 1,000 words than the third 1,000 words, etc. Because he found that these lists did indeed provide the anticipated deceasing coverage, aside from a small hiccup between the 9K and 10K lists, he concluded that the lists were in fact clearly properly ordered (p. 65). While this analysis can provide important evidence regarding the proper ordering of each frequency band as a whole set words, it is crucial to note that it does not provide details on the reliability of individual members of each band. This point is actually acknowledged by Nation (2006): this approach does not show that each word family member is in the right list (p. 64). In other words, while BNC 1K did in fact provide more coverage than the BNC 2K in both the BNC and the comparison corpus, there may be words in the BNC 2Kor even in the BNC 3K, 4K, etc.which provide higher coverage in the comparison corpus than do certain words in the BNC 1K. Because of this uncertainty, how can we confidently propose to teachers and learners that, for example, the first two or three BNC lists are the lists that merit focus? Alternatively, if the other approach Nation and Webb (2010) proposed in Step #6 were followed if Nation (2006) were to have extracted a list of the most frequent 2,000 or 3,000 words from the comparison corpuswould this list be the same as the lists produce from the BNC? And if not, how would a list creator go about rectifying any

97

differences (i.e., notable omissions or unusual inclusions) to create a stable and reliable list? The purpose of this chapter is to investigate the reliability of word lists and the design of corpora required to produce reliable word lists. Toward this goal, the following question is addressed: What kind of sample is required to reliably capture the important words in a domain? The following sections detail a number of experiments designed to answer this question. Specifically, analyses in Section 5.2 assess the degree to which different size samples are able to produce important word lists which reliably reflect the lexical distribution in my target domain: introductory psychology textbooks. The second set of experiments (Section 5.3) considers the definition of importance, and whether word lists based on different definitions of importance (i.e., meeting different distributional criteria) can be reliably represented. The results of these experiments will demonstrate the ability of different size samples to reliably capture lists of important words in introductory psychology textbooks. More importantly, these results will provide crucial insights regarding the reliability of previously proposed corpus-based word lists. For all experiments detailed in this chapter, the target domain was operationalized as the 10 complete textbooks comprising the PSYTB corpus. It is important to note that I am making no claim that the PSYTB corpus, the whole corpus of 10 textbooks, is a perfect representation of lexical distributions introductory psychology textbooks. The PSYTBs incomplete representation of lexical diversity in introductory psychology textbooks was highlighted in Chapter 4. Thus acknowledged, however, these

98

experiments will allow a widow into the size corpus required to represent lexical distributions in a target domain.

5.2

Capturing the Important Words in the PSYTB Corpus The first set of experiments was designed to assess the ability of corpora to

capture a list of important words from a target domain. Toward this goal, the first step was to decide which criteria for importance would be appropriate. As a place to start, I considered the criteria that Coxhead (2000) used in selecting AWL words. As noted above, Coxhead proposed that a word merited inclusion on the AWL (i.e., it was deemed worthy of instructional focus) if it occurred at least 100 times (approximately 28 times/million words) in her corpus, at least 10 times in each of the four main macrodiscplines in her corpus, and in approximately one half of the subdisciplines represented in her corpus. Though not a perfect equivalent, an introductory psychology book could be seen as a disciplinary overview, with each chapter representing a different "field"or at least focus of studywithin the discipline of psychology. Thus, as a place to begin for the first set of experiments, a word was deemed "important" if it occurred in one half of the chapters in the corpus (whether the corpus be the entire set of 10 textbooks, or a smaller set of textbooks sampled from the corpus for comparison). This single criterion had the added benefit of, in effect, forcing two additional criteria. First, if a word occurred in one half of the chapters in the corpus, it was, as a rule, also found in at least one half of the textbooks in the corpus. Additionally, even with no minimum frequency requirement, the chapter range requirement forced a minimum frequency requirement of

99

approximately 22 occurrences per million words. By comparison, the minimum frequency requirement for words on the AWL, 100 occurrences in the corpus, norms to approximately 29 occurrences per million words. Though there is a difference in minimum frequency criterion between Coxheads study and this set of experiments (i.e., 29 occurrences/million vs. 22 occurrences/million), it is important to keep in mind that the AWL criterion was based on the frequency of occurrence of word family members combined, rather than the combined frequency of lemma members. Thus, it is to be expected that frequencies for lemmas would be lower than they would for the word families to which they belong. If anything, my inclusion criteria are somewhat stricter than those used by Coxhead. The experiment was conducted as follows. First, the criterion for importance (i.e., occurrence in 50% of chapters) was applied to the whole corpus of 10 books, generating a list of important lemmas. Then, different size subsamples from the corpus, from single whole textbooks through a set of nine whole textbooks, were assembled. These different size samples might be thought of as different sampling rates. Sample of one textbook (out of 10 textbooks) could be considered a 10% sampling rate, samples of two textbooks a 20% sampling rate, etc. Then, the same range criterion was applied to each sample, and lists of important words were generated for each sample. The list of important words for each set was then compared with the word list generated by the whole corpus. The comparisons between the lists were made as follows. First, I identified which words met the criteria of importance in both the whole corpus and the sample corpus. Then, I identified which words, if any, only the sample identified as important. Then, I

100

identified which words, if any, only the whole corpus identified as important. This comparison is illustrated in the Venn diagram in Figure 4.

List created by sample

List created by whole corpus

Words meeting criteria only in sample (error with precision) Words meeting criteria in both sample and corpus

Words meeting criteria only in corpus (error with completeness)

Figure 4. Comparing lists of important words from the whole corpus with the lists identified in the samples

Essentially, there can be two types of difference between sample lists and the whole corpus list. First, some words might meet the criteria of importance in the sample (i.e., will occur in 50% of the chapters in the sample), but not in the whole corpus. These additional words constitute error with precision, as they are incorrectly identified as important when we can see that they are not in fact important in the target domain (operationalized as the whole PSYTB corpus of 10 textbooks). Conversely, some words will not meet the criteria of importance in the sample (i.e., will occur fewer than 50% of the chapters in the sample), but will in the whole corpus.

101

These missing words constitute error with completeness, as their absence makes the sample lists incomplete. Interpretation of these comparisons was made in the following way. If a subset of textbooks, for example, a sample of three or four complete textbooks, provided a list comparable to one generated by the whole corpus of 10 complete textbooks, it could be argued that the subset of textbooks sufficiently represents the lexical distributions in the domain (represented by the set of 10 textbooks) and that additional books are unnecessary. Alternatively, if there were still notable differences between a list produced from a sample and the list produced from all 10 textbooks, it could be argued that the sample did not adequately represent the lexical distribution in the domain (i.e., the PSYTB corpus). Table 10 provides the results of an experiment investigating whether one whole psychology textbook provides a sufficient sample of the important words in the target domain. Specifically, it summarizes the comparison between lists produced from individual whole textbooks with the list produced from the whole corpus of 10 textbooks. Five important word lists were generated, one for each of five different textbooks, in order to account for between-book diversity. In the first row of Table 10, we can see that I did not apply any minimum frequency criteria to the lists, thus the label none. Following this row to the right, we can then see how many words met the criteria of importance in each of the five textbooks. 1,745 words occurred in at least 50% of the chapters in Textbook 1, 1,771 words met this criterion in Textbook 2, etc. These lists, then, were compared with the list of 1,532 words that were found in at least 50% of the chapters in the whole corpus.

102

Just from the difference in list size we can begin to see that there is indeed a difference between the sample lists and the whole corpus list. For example, 1,745 lemmas were identified as important in Textbook 1, whereas only 1,532 lemmas were identified as important across the whole corpus. However, this surface observation only begins to tell the story of the difference between the two lists. The word list culled from Textbook 1 is not necessarily only 213 words (1,745-1,532=213) different from the whole corpus word list. Rather, the Textbook 1 word list and the whole corpus word list share some words, but there are some words identified as important only in the sample, and some words identified as important only in the whole corpus. Again, Figure 4 illustrates this phenomenon. In Table 10, we can see precision error in the column headed Only in Sample. On average, there were approximately 429 words that met the criteria of importance in the samples of one textbook that did not maintain the 50% chapter range across the whole corpus. These words account for, on average, 28.03% of the difference between these sample and whole corpus lists. Error with completeness is noted in Table 11 in the column titled Only in whole corpus. From this table, we can see that, on average, the whole corpus identifies 163 words as important that the samples do not. This set of missing words, completeness error, accounts for 10.64% of the difference between the sample lists and the whole corpus list. In sum, there are, on average, 592.40 (SD 143.91) words that are not shared by both lists. So, we can say that lists produced from samples of one textbook are, on average, 38.67% (SD 9.39%) different from a list identified by the whole corpus of 10 books.

103

Table 10 Comparison of List Produced by One Whole Textbook with Lists Produced by Whole Corpus (50% chapter range requirement)
Size of sample compared with corpus 1 TB Number of words meeting criteria in different samples Min Freq Required none Textbook 1 1745 Textbook 2 1771 Textbook 3 1895 Textbook 4 1470 Textbook 5 2176 Whole Corpus 1532 Average of words not meeting criteria in both sample and whole corpus Only in Only in whole Total Sample corpus difference SD 429.40 163.00 592.40 143.91 28.03% 10.64% 38.67% 9.39%

104

Thus, it would be fair to conclude that a sample of a single whole textbook does not provide an adequate representation of lexical distributions in the target domain. We can see this because these samples do not produce lists that are reasonably comparable to a list generated by a corpus of 10 textbooks. Stated simply, a sample of one whole textbook is too small. So what size sample can capture the important words identified from the target domain (i.e., the PSYTB corpus of 10 textbooks)? To answer this question, the comparisons were repeated with random sample sets of two through nine whole textbooks taken from the corpus of 10 textbooks. Again, 5 samples of each size were taken to account for between-book lexical variability. The 50% range criterion was applied to each sample, and the word lists generated were compared with the whole corpus list of 1,532 important words. Results of these comparisons can be seen in Table 11.

105

Table 11 Comparison of Lists Produced by Samples of Two through Nine Textbooks with Lists Produced by Whole Corpus (50% chapter range requirement)
Size of sample compared with corpus 2 TBs 3 TBs 4 TBs 5 TBs 6 TBs 7 TBs 8 TBs 9 TBs Number of words meeting criteria in different samples Min Freq Required none none none none none none none none Sample 1 1564 1529 1784 1685 1603 1567 1546 1569 Sample 2 1814 1797 1621 1438 1594 1555 1552 1553 Sample 3 1506 1519 1521 1513 1605 1576 1556 1524 Sample 4 1431 1558 1658 1675 1669 1613 1528 1582 Sample 5 1612 1822 1555 1718 1565 1506 1606 1553 Whole Corpus 1532 1532 1532 1532 1532 1532 1532 1532 Average of words not meeting criteria in both sample and whole corpus Only in Only in whole Total Sample corpus difference SD 183.20 142.80 326.00 37.55 11.95% 187.40 12.22% 146.80 9.58% 119.80 7.81% 100.00 6.52% 61.40 3.98% 47.60 3.09% 32.40 2.10% 9.32% 87.40 5.70% 64.00 4.17% 59.00 3.85% 37.80 2.47% 43.00 2.79% 35.00 2.27% 21.20 1.37% 21.27% 274.80 17.93% 210.80 13.75% 178.80 11.66% 137.80 8.99% 104.40 6.76% 82.60 5.35% 53.60 3.47% 2.45% 72.72 4.74% 35.69 2.33% 35.24 2.30% 23.41 1.53% 9.29 0.60% 11.78 0.76% 9.24 0.60%

106

As we look at Table 11, it is important to reflect for a moment on the relationship between sample size and generalizability of important word lists generated from samples of different sizes. As an example, consider a comparison between a sample of three textbooks and the entire corpus of 10 textbooks. In the PSYTB corpus, a sample of three textbooks accounts for approximately 45-48 chapters and 1,000,000 running words. This sample is equal to nearly one third of the total running words in the entire Academic Corpus from which Coxhead (2000) generated the AWL. Thus, by comparison, three complete textbooks would seem a reasonable sample size to represent the narrow domain of introductory psychology textbooks. However, when a comparison is made between lists culled from a subset of three textbooks and the list produced from the entire PSYTB corpus of 10 textbooks (approximately 3.1 million words), we see that a sample of 3 textbooks is not adequate. On average, the lists from samples of three textbooks capture approximately 94% of the important words generated from a corpus of 10 textbooks, but more than 12% of the words on these sample lists do not hold their currency across the entire corpus. In real terms then, a list generated by a corpus of three textbooks has, on average, 275 different lemmas than does a list generated by the whole PSYTB corpus. This is a total difference of approximately 18%. To put a face on this difference, Table 12 below provides examples of words that would not be shared by a list generated from one sample of three textbooks and a list generated by the whole corpus of 10 books. From this table, we can get an idea of some words that would be identified as important in either the sample of 3 textbooks or across the entire corpus, but not in both. More importantly, we can get an idea of how important it is to assess the representativeness of corpora and the reliability of conclusions drawn

107

from our corpora. If representativeness was determined based solely on the size of the sample (i.e., three whole textbooks; approximately 1 million words), we might conclude that our corpus should be sufficiently representative, and that the word list it generates is worth instructional focus. Such a conclusion might be misguided, however. Class time might then be spent on 75 words that do not hold currency across the whole corpus, such as activate, adjust, or accomplish. Conversely, and perhaps more importantly, perhaps no focus would be given to the 86 words that actually do prove important across the entire corpus but did not meet the criteria of importance in the sample (e.g., process(n.), creative, or sensation). Table 12 Comparison of Words Meeting Importance Criteria in a Sample of 3 Textbooks and Words Meeting Importance Criteria in the Whole Corpus Lemmas that meet the criteria only in the sample (i.e., errors with precision) activate adjust accomplish sleep (v) neutral violence moderate (adj) diagnose discrimination design (n.) interview (n.) trigger (n.) manage underlying substantial recognition service (n.) team Lemmas that meet the criteria only in the whole corpus (i.e., errors with completeness) process (v.) sensation creative threat (n.) theme (n.) forget video (n.) norm (n.) surface (n.) advantage unique responsible description design (n.) contain encounter (n.) insight progress (n.) estimate (n.)

108

It is a logical necessity that, as corpora reflect a greater proportion of the domain that they are designed to represent, lists of important lemmas that they produce would more closely match the list representing important words in the whole domain. On average, this is the case here, as can be seen in Table 11. As the samples grow (e.g., from two to three to four, etc. textbooks), the difference between word lists (i.e., the average percentage of additional important lemmas and missing important lemmas) decreases. However, at what point might we conclude that lists from our sample reflect a reasonably equivalent set of important words? Many factors must be considered here. Practical considerations such as the time, effort, and expense of acquiring texts are certainly relevant. In other words, the fewer textbooks needed, the better. However, we must also consider the reliability of the lists we produce. Ideally, a list produced from a sample would maximally concur with a list produced by the whole corpus (i.e., the domain). That is, the sample list would be maximally precise (i.e., not include additional words that do not hold their currency across the whole corpus) and complete (i.e., identify all important words from the corpus). From Table 11, we can see that no sample, even a sample of nine out of 10 textbooksa 90% sampling rate!, is able to produce a word list that perfectly mirrors the list produced from the whole corpus, at least when importance is operationalized as 50% chapter range.

5.3

Manipulating the Criteria for Important Words Before going forward, it is necessary to consider what is meant by important

words. As noted in Chapter 2, determination regarding inclusion of words on important words lists has most frequently been based the notions of efficiency or
109

impact, which are operationalized by distributional characteristics. Essentially, the more frequently and widely a word occurs, the better the return on instructional and learning time invested in this word. If a word meets certain predetermined frequency, range, and/or dispersion criteria, it merits attention. In the experiments detailed in section 5.2, importance was operationalized as having a 50% chapter range. It must be acknowledged, however, that this criterion is somewhat arbitrary. However, it must also be noted that somewhat arbitrary criteria are a hallmark of many previous word list development studies discussed above (e.g., Coxhead, 2000: AWL). At what point does a words range or frequency deem it important? Thus, the experiment detailed in section 5.2 is partially repeated in this section to see how manipulation of range (section 5.3.1) and frequency (section 5.3.2) criteria affects word lists, and whether any definition of importance can be reliably represented. 5.3.1 Manipulating minimum frequency criteria In order to assess the effect of frequency requirements on the reliability of words lists, the experiments detailed above were repeated by adding minimum frequency criteria. The experiment was conducted as follows. First, the range criterion (occurrence in 50% of chapters) and six different frequency criterion (20 occurrences per million, 40 per million, 60 per million, 80 per million, 100 per million, and 200 per million) were applied to the whole PSYTB corpus of 10 books, generating six lists of important words (i.e., one based on each of the six minimum frequency criteria). Then, different size subsamples of the corpus, from single whole textbooks through nine whole textbooks, were assembled. Five samples of each size were taken in order to account for between110

book diversity. Then, the same range and frequency criteria were applied to each sample, and lists of important words were generated for each sample. The list of important words for each set was then compared with the lists generated by the whole corpus. Table 13 shows the comparison between word lists produced by samples of one complete textbook with the word lists produced from the whole corpus at each of the different minimum frequency requirements. From Table 13, we can see that as I increase the minimum frequency criteria (i.e., the column titled Min Freq Required), two things happen with regard to reliability. On the positive side, my percentage of total difference decreases, ultimately to about 30% with a minimum frequency of 80 occurrences/million words. Further increasing the minimum frequency requirement to 100 or 200 occurrences per million has little effect on the reliability. Essentially, then, even by imposing a very strict frequency criteria and identifying a restricted set of very frequent, and thus, presumably, very important words, samples of one textbooks are still able to capture lists with only 70% similarity to the list identified by the whole corpus (i.e., the domain). With this size sample (i.e., one whole textbook), there is little perceived benefit toward the goal of lessening the difference between the listsof more reliably capturing the important words in the corpus. Rather, increasing the minimum frequency requirement only serves to produce smaller lists that still differ from the corpus lists by approximately 30%.

111

Table 13 Comparison of Lists Produced by One Whole Textbook with List Produced by Whole Corpus (50% chapter range requirement; various minimum frequency requirements)

Number of words meeting criteria in different samples Size of sample compared 1vs10 Min Freq Required None 20min 40min 60min 80min 100min 200min Sample 1 1745 1745 1678 1464 1284 1107 671 Sample 2 1771 1771 1684 1484 1284 1138 645 Sample 3 1895 1895 1833 1607 1366 1154 663 Sample 4 1470 1470 1451 1349 1198 1090 681 Sample 5 2176 2176 1965 1593 1333 1162 694 Whole Corpus 1533 1533 1533 1436 1286 1117 670

Average of words not meeting criteria in both sample and whole corpus Only in Only in whole Total Sample corpus difference SD 429.40 28.01% 429.40 28.01% 358.00 23.35% 256.20 17.84% 195.60 15.21% 169.40 15.17% 100.2 14.96% 163.00 10.63% 163.00 10.63% 180.80 11.79% 203.80 14.19% 195.60 15.21% 161.20 14.43% 102.4 15.28% 592.40 38.64% 592.40 38.64% 538.80 35.15% 460.00 32.03% 391.20 30.42% 330.60 29.60% 202.60 30.24% 143.91 9.39% 143.91 9.39% 110.47 7.21% 78.27 5.45% 48.75 3.79% 47.48 4.25% 24.51 3.66%

112

What happens when minimum frequency requirements are added to larger samples? Table 14 shows the results of comparisons between lists produced from different sample sizes at different minimum frequency requirements with lists produced by the whole PSYTB corpus at the same six minimum frequency requirements (i.e., 20, 40, 60, 80, 100, and 200). As would be expected, increasing the minimum frequency requirement decreases the size of the lists. That is, fewer words meet the stricter minimum frequency requirements. Surprisingly however, these smaller, more restricted, and arguably more important lists are not consistently more reliable than larger, less restricted lists. Even the 670 words that occur 200 times/million words in the whole corpus cannot be captured reliably from the samples. The only variable that appears to consistently improve reliability is sample size. That is, the larger the sample (i.e., the higher the sampling rate), the greater the reliability of lists produced from the samples.

113

Table 14 Comparison of Lists Produced by Samples of Two through Nine Textbooks with Lists Produced by Whole Corpus (50% chapter range requirement)
Number of words meeting criteria in different samples Size of sample compared 2vs10 Min Freq Required None 20min 40min 60min 80min 100min 200min 3vs10 None 20min Sample 1 1564 1564 1548 1418 1254 1110 644 1529 1529 Sample 2 1814 1814 1752 1529 1317 1146 677 1797 1797 Sample 3 1506 1506 1495 1374 1242 1101 674 1519 1519 Sample 4 1431 1431 1430 1371 1232 1095 636 1558 1558 Sample 5 1612 1612 1591 1447 1264 1115 643 1822 1822 Whole Corpus 1532 1532 1532 1436 1286 1117 670 1532 1532 Average of words not meeting criteria in both sample and whole corpus Only in Only in whole Total Sample corpus difference SD 183.20 142.80 326.00 37.55 11.95% 183.20 11.95% 167.60 10.93% 134.00 9.27% 107.20 8.30% 109.00 9.72% 59.80 8.89% 187.40 12.22% 187.40 12.22% 9.32% 142.80 9.32% 149.40 9.75% 153.20 10.60% 138.40 10.71% 117.60 10.49% 78.00 11.60% 87.40 5.70% 87.40 5.70% 21.27% 326.00 21.27% 317.00 20.68% 287.20 19.85% 245.60 18.99% 226.60 20.20% 137.80 20.48% 274.80 17.93% 274.80 17.93% 2.45% 37.55 2.45% 30.16 1.97% 20.90 1.44% 21.74 1.68% 18.96 1.69% 6.22 0.92% 72.72 4.74% 72.72 4.74%

114

Number of words meeting criteria in different samples Size of sample compared Min Freq Required 40min 60min 80min 100min 200min Sample 1 1528 1429 1265 1115 678 Sample 2 1763 1544 1328 1157 670 Sample 3 1514 1401 1248 1097 674 Sample 4 1556 1465 1272 1141 676 Sample 5 1767 1535 1331 1145 673 Whole Corpus 1532 1436 1286 1117 670

Average of words not meeting criteria in both sample and whole corpus Only in Only in whole Total Sample corpus difference SD 172.40 91.80 264.20 60.53 11.25% 128.00 8.86% 97.40 7.54% 93.00 8.30% 50.60 7.53% 5.99% 100.20 6.94% 101.60 7.87% 84.00 7.49% 49.40 7.35% 64.00 4.17% 64.00 4.17% 66.80 4.36% 75.40 5.22% 76.20 5.90% 63.20 5.64% 40.4 17.23% 228.20 15.77% 199.00 15.39% 177.00 15.78% 100.00 14.86% 210.80 13.75% 210.80 13.75% 202.60 13.22% 176.00 12.16% 152.60 11.80% 127.80 11.39% 78.40 3.95% 35.33 2.44% 22.73 1.76% 7.48 0.67% 13.56 2.02% 35.69 2.33% 35.69 2.33% 25.91 1.69% 12.35 0.85% 10.26 0.79% 9.86 0.88% 8.73

4vs10

None 20min 40min 60min 80min 100min 200min

1784 1784 1741 1536 1325 1142 673

1621 1621 1611 1465 1287 1116 673

1521 1521 1520 1423 1266 1107 640

1658 1658 1644 1484 1306 1132 693

1555 1555 1554 1453 1282 1120 674

1532 1532 1532 1436 1286 1117 670

146.80 9.58% 146.80 9.58% 135.80 8.86% 100.60 6.97% 76.40 5.92% 64.60 5.76% 38

115

Number of words meeting criteria in different samples Size of sample compared 5vs10 Min Freq Required none 20min 40min 60min 80min. 100min 200min Sample 1 1685 1685 1670 1515 1313 1143 679 Sample 2 1438 1438 1438 1373 1247 1100 661 Sample 3 1513 1513 1512 1413 1260 1115 663 Sample 4 1675 1675 1665 1511 1318 1133 693 Sample 5 1718 1718 1690 1515 1330 1139 675 Whole Corpus 1532 1532 1532 1436 1286 1117 670

Average of words not meeting criteria in both sample and whole corpus Only in Only in whole Total Sample corpus difference SD 5.65% 119.80 7.81% 119.80 7.81% 111.40 7.27% 79.80 5.53% 62.60 4.85% 53.60 4.78% 31.40 4.67% 6.01% 59.00 3.85% 59.00 3.85% 61.40 4.01% 61.40 4.25% 62.00 4.80% 49.60 4.13% 30.20 4.49% 37.80 2.47% 37.80 2.47% 39.20 2.56% 47.00 3.25% 11.65% 178.80 11.66% 178.80 11.66% 172.80 11.27% 141.20 9.76% 124.60 9.64% 103.20 9.20% 61.60 9.15% 137.80 8.99% 137.80 8.99% 136.20 8.88% 122.80 8.49% 1.30% 35.24 2.30% 35.24 2.30% 29.24 1.91% 13.46 0.93% 7.64 0.59% 4.82 0.43% 8.65 1.29% 23.41 1.53% 23.41 1.53% 20.54 1.34% 10.76 0.74%

6vs10

none 20min 40min 60min

1603 1603 1600 1463

1594 1594 1590 1477

1605 1605 1602 1484

1669 1669 1659 1509

1565 1565 1563 1446

1532 1532 1532 1436

100.00 6.52% 100.00 6.52% 97.00 6.33% 75.80 5.25%

116

Number of words meeting criteria in different samples Size of sample compared Min Freq Required 80min 100min 200min Sample 1 1291 1125 662 Sample 2 1317 1157 676 Sample 3 1304 1138 659 Sample 4 1324 1153 673 Sample 5 1274 1113 683 Whole Corpus 1286 1117 670

Average of words not meeting criteria in both sample and whole corpus Only in Only in whole Total Sample corpus difference SD 56.60 47.60 104.20 12.07 4.38% 51.20 4.57% 24.60 3.66% 3.69% 36.00 3.21% 27.00 4.02% 43.00 2.79% 43.00 2.79% 43.20 2.80% 45.40 3.14% 46.00 3.56% 36.40 3.25% 22.80 3.39% 35.00 2.27% 8.06% 87.20 7.77% 51.60 7.67% 104.40 6.76% 104.40 6.76% 103.80 6.72% 99.40 6.87% 90.80 7.02% 78.60 7.01% 45.80 6.81% 82.60 5.35% 0.93% 7.50 0.67% 4.51 0.67% 9.29 0.60% 9.29 0.60% 9.09 0.59% 12.54 0.87% 16.21 1.25% 15.47 1.38% 7.60 1.13% 11.78 0.76%

7vs10

none 20min 40min 60min 80min 100min 200min

1567 1567 1567 1466 1301 1132 668

1555 1555 1553 1446 1282 1125 674

1576 1576 1576 1461 1300 1126 669

1613 1613 1610 1477 1309 1141 673

1506 1506 1506 1428 1267 1115 682

1532 1532 1532 1436 1286 1117 670

61.40 3.98% 61.40 3.98% 60.60 3.93% 54.00 3.74% 44.80 3.47% 42.20 3.77% 23.00 3.42%

8vs10

none

1546

1552

1556

1528

1606

1532

47.60 3.09%

117

Number of words meeting criteria in different samples Size of sample compared Min Freq Required 20min 40min 60min 80min 100min 200min Sample 1 1546 1546 1482 1333 1176 727 Sample 2 1552 1551 1461 1285 1136 675 Sample 3 1556 1556 1510 1359 1206 745 Sample 4 1528 1528 1434 1288 1122 667 Sample 5 1606 1602 1476 1306 1142 676 Whole Corpus 1532 1532 1436 1286 1117 670

Average of words not meeting criteria in both sample and whole corpus Only in Only in whole Total Sample corpus difference SD 47.60 35.00 82.60 11.78 3.09% 46.80 3.03% 57.80 4.00% 49.80 3.86% 50.80 4.53% 37.60 5.59% 2.27% 35.20 2.28% 32.20 2.23% 28.60 2.21% 16.40 1.46% 12.60 1.87% 21.20 1.37% 21.20 1.37% 22.20 1.44% 22.00 1.52% 25.80 2.00% 12.00 5.35% 82.00 5.31% 90.00 6.22% 78.40 6.06% 67.20 5.99% 50.20 7.46% 53.60 3.47% 53.60 3.47% 54.60 3.53% 46.20 3.19% 46.00 3.56% 31.60 0.76% 11.14 0.72% 17.89 1.24% 22.79 1.76% 25.95 2.31% 18.66 2.77% 9.24 0.60% 9.24 0.60% 9.94 0.64% 6.61 0.46% 5.74 0.44% 7.54

9vs10

none 20min 40min 60min 80min 100min

1569 1569 1567 1460 1292 1128

1553 1553 1553 1445 1284 1129

1524 1524 1523 1424 1280 1129

1582 1582 1580 1466 1297 1133

1553 1553 1553 1451 1284 1129

1532 1532 1532 1436 1286 1117

32.40 2.10% 32.40 2.10% 32.40 2.10% 24.20 1.68% 20.20 1.56% 19.60

118

Number of words meeting criteria in different samples Size of sample compared Min Freq Required 200min Sample 1 675 Sample 2 674 Sample 3 671 Sample 4 671 Sample 5 664 Whole Corpus 670

Average of words not meeting criteria in both sample and whole corpus Only in Only in whole Total Sample corpus difference SD 1.75% 1.07% 2.82% 0.67% 10.40 1.55% 12.40 1.84% 22.80 3.39% 6.30 0.94%

119

Experiments detailed above attempted to account for the effect that frequency criteria can have on word lists and whether word lists based on different definitions of important can be more reliably captured. In each case, the same criteria were applied to the whole corpus word list and the sample word lists. For example, when the minimum frequency requirement applied to the whole corpus was 100 occurrences/million, the same minimum frequency requirement was applied to the samples. However, is it appropriate to expect that words would reach the same minimum frequencies in a sample as they do in a language use domain? It has been noted that corpora for lexical studies need to be sufficiently large enough for less frequent linguistic items to occur (Biber, 1992; Sinclair, 1991). Taking this notion a step further, a corpus needs to be sufficiently large for lexical items to occur according to their distribution in the target domain. For example, imagine a word occurs at a frequency of 100 times per million words in a corpus of 100 texts. However, in a subsample of 50 texts, frequency variation from individual books will have a greater influence on this words apparent frequency. If a few texts use this word frequently, the normed frequency of the word in the sample will be higher than in the whole corpus. Conversely, less frequent use by a few texts will cause the normed frequency to be lower than in the whole corpus. Thus, a sample may not accurately reflect the distribution of this word in the target domain. For this reason, it may be necessary to adjust the inclusion criteria in the sample to account for this variability. For example, in the present study, if we want to identify words that occur in one half of the chapters and 100 times/million words in the target domain, then we might expect that some words may occur with somewhat greater or lower frequency in a sample

120

than in the target domain. For this reason, a short experiment was conducted to determine the effect of adjusting the inclusion criteria for words in the samples to account for this possible variability. Table 15 shows what happens as the minimum frequency requirement is adjusted for samples while keeping the minimum frequency requirement at 100/million for the corpus. The list against which subsample lists were compared was the list of 1,117 words culled from the whole corpus when the minimum frequency requirement was set at 100 occurrences/million. The minimum frequency requirement for the samples was lowered to 80/million and 60/million respectively, and increased to 120/million and 140/million respectively. As can be seen in Table 15, this manipulation, unfortunately, decreases only one type of error (i.e., either completeness or precision), while increasing the other type of error. This phenomenon can be exemplified by looking at experiments with samples of three textbooks. When the minimum frequency criteria is matched at 100 occurrences/million for both the whole corpus list and the subsample list, on average there are 84 missing words and 93 additional words. This leaves a completeness error rate of approximately 7.5% (84 1,117), and a precision error rate of approximately 8.3% (93 1,117), for a total error rate of 15.8%. By decreasing the frequency criteria of the subsample list to 80/million, we get a very welcome halving of the completeness error rate to 3.5%, but, unfortunately, a steepmore than doublingof the precision error rate to 18.4%. By adjusting in the other directionby increasing the minimum frequency criteria to 120/million for the samplethe opposite occurs. The precision error rate drops to 3.5%, but the arguably more important completeness error rate increases to

121

14.3%. The same thing happens, regardless of the sample size (i.e., 2 books, 3 books, 4 books). Thus, this experiment provides evidence that, at least with the size of the corpus and samples used in this study, adjusting the minimum frequency criteria for the sample while holding the criteria stable for the corpus does not notably improve the overall reliability of the lists. In fact, with each sample, the overall difference between sample lists and whole corpus lists actually increases when minimum frequency criterion are not shared by the sample lists and the whole corpus lists.

122

Table 15 Comparison of Lists Produced by Samples and by Whole Corpus when Minimum Frequency Requirements are Adjusted for Samples
Comparison Minimum Frequency Criterion Corpus 2 books vs. corpus 100/mil. Sample 60/mil. 80/mil. 100/mil. 120/mil. 140/mil. 3 books vs. corpus 100/mil. 60/mil. 80/mil. 100/mil. 120/mil. 140/mil. 4 books vs. corpus 100/mil. 60/mil. 80/mil. 100/mil. 120/mil. 140/mil. 5 books vs. corpus 100/mil. 60/mil. 80/mil. 100/mil 120/mil. 140/mil. Average of words not meeting criteria in both sample and whole corpus Only in sample 354.20 (31.57%) 208.60 (18.59%) 109.00 (9.71%) 53.20 (4.74%) 23.40 (2.09%) 378.80 (33.76%) 206.40 (18.40%) 93.00 (8.29%) 39.00 (3.48%) 18.40 (1.64%) 366.40 (32.66%) 194.80 (17.36%) 64.60 (5.76%) 20.80 (1.85%) 11.60 (1.03%) 358.20 (31.93%) 189.40 (16.88%) 53.60 (4.78%) 14.80 (1.32%) 8.60 (0.77%) Only in whole corpus 48.40 (4.31%) 68.80 (6.13%) 117.60 (10.48%) 188.40 (16.79%) 273.00 (24.33%) 26.00 (2.32%) 39.60 (3.53%) 84.00 (7.49%) 160.00 (14.26%) 250.80 (22.35%) 16.20 (1.44%) 23.60 (2.10%) 63.20 (5.63%) 144.00 (12.83%) 240.60 (21.44%) 14.80 (1.21%) 17.80 (1.46%) 49.60 (4.06%) 134.60 (11.01%) 237.80 (19.46%) Total difference 402.60 (35.88%) 277.40 (24.72%) 226.60 (20.20%) 241.60 (21.53%) 296.40 (26.42%) 404.80 (36.08%) 246.00 (21.93%) 177.00 (15.78%) 199.00 (17.74%) 269.20 (23.99%) 382.60 (34.10%) 218.40 (19.47%) 127.80 (11.39%) 164.80 (14.69%) 252.20 (22.48%) 373.00 (33.24%) 207.20 (18.47%) 103.20 (9.20%) 149.40 (13.32%) 246.40 (21.96%) Total Difference SD 23.82 (2.12%) 20.92 (1.86%) 18.96 (1.69%) 15.08 (1.34%) 21.71 (1.93%) 36.42 (3.25%) 16.45 (1.47%) 7.48 (0.67%) 10.27 (0.92%) 10.47 (0.93%) 28.61 (2.55%) 15.63 (1.39%) 9.86 (0.88%) 4.32 (0.39%) 9.71 (0.87%) 50.95 (4.54%) 18.05 (1.61%) 4.82 (0.43%) 6.62 (0.59%) 5.77 (0.51%)

123

5.3.2

Adjusting range requirements The next set of experiments sought to determine the effect of adjusting chapter

range requirements on word lists produced and the efficiency and reliability of capturing important words with smaller samples. The purpose of this experiment was to assess the possibility that reliable lists could be culled by adjusting the minimum chapter range requirements for inclusion on the important word list. To test this possibility, the chapter range requirement was first reduced to 25%, or 40 out of 157 chapters. The results of this experiment can be seen in Table 16. Comparisons were only made through samples of 5 textbooks, as reliability (i.e., similarity between sample lists and whole corpus lists) did not appear to be any higher than with the previously used 50% chapter range criterion. As can be seen in Table 16, because of the laxer range requirements, the word lists become a great deal longer than the lists based on 50% chapter range criterion. For example, while the whole PSYTB corpus word list based on 50% chapter range and no minimum frequency criteria consists of 1,532 words, adjusting the chapter range criterion to 25% produces a list of 3,156 words. However, the reliability has not been improved. There still remains a 23.87% difference, on average, between word lists culled from samples of two textbooks and the whole corpus word list (compared to 21.27% difference with a 50% chapter range criterion). In fact, little if any difference in reliability can be seen when we adjust our definition of importance to include laxer chapter range requirements.

124

Table 16 Comparison of Lists Produced by Different Samples with Lists Produced by Whole Corpus (25% chapter range requirement)
Number of words meeting criteria in different samples Size of sample compared 2 vs. 10 Min Freq Required none 20min 40min 60min 80min 100min 200min Sample 1 3141 3006 2209 1755 1453 1241 671 Sample 2 3829 3301 2327 1800 1476 1235 693 Sample 3 2963 2840 2134 1677 1413 1215 698 Sample 4 3415 3218 2351 1834 1475 1229 666 Sample 5 3238 3057 2205 1709 1417 1199 658 Whole Corpus 3156 3068 2283 1767 1460 1222 689 Average of words not meeting criteria in both sample and whole corpus Only in Only in whole Total Sample corpus difference SD 457.20 296.00 753.20 113.68 14.49% 342.60 11.17% 221.8 9.72% 180.40 10.21% 142.80 9.78% 129.80 10.62% 64.60 9.38% 3 vs. 10 none 20min 3153 3043 3634 3289 2983 2885 3029 3104 3657 3295 3156 3068 359.60 11.39% 286.00 9.32% 9.38% 326.20 10.63% 259.6 11.37% 192.40 10.89% 156.00 10.68% 128.00 10.47% 76.40 11.09% 224.40 7.11% 230.80 7.52% 23.87% 668.80 21.80% 481.4 21.09% 372.80 21.10% 298.80 20.47% 257.80 21.10% 141.00 20.46% 584.00 18.50% 516.80 16.84% 3.60% 45.50 1.48% 20.39117 0.89% 35.42 2.00% 11.84 0.81% 13.68 1.12% 7.31 1.06% 124.45 3.94% 57.23 1.87%

125

Number of words meeting criteria in different samples Size of sample compared Min Freq Required 40min 60min 80min 100min 200min Sample 1 2316 1787 1437 1210 696 Sample 2 2339 1771 1445 1225 677 Sample 3 2211 1737 1440 1225 697 Sample 4 2266 1799 1449 1254 694 Sample 5 2342 1800 1474 1228 687 Whole Corpus 2283 1767 1460 1222 689

Average of words not meeting criteria in both sample and whole corpus Only in Only in whole Total Sample corpus difference SD 194.80 183.00 377.80 28.17 9.50% 147.00 8.32% 110.20 7.55% 100.00 8.18% 51.00 7.40% 8.74% 135.20 7.65% 121.20 8.30% 93.60 7.66% 49.80 7.23% 164.60 5.22% 193.60 6.31% 150.20 6.58% 100.60 5.69% 90.00 6.16% 74.40 6.09% 18.24% 282.20 15.97% 231.40 15.85% 193.60 15.84% 100.80 14.63% 429.40 13.61% 399.80 13.03% 289.60 12.69% 212.00 12.00% 171.80 11.77% 144.60 11.83% 4.58% 11.45 0.65% 20.38 1.40% 13.97 1.14% 11.08 1.61% 35.44 1.12% 16.77 0.55% 7.50 0.33% 17.82 1.01% 7.26 0.50% 14.64 1.20%

4 vs. 10

none 20min 40min 60min 80min 100min

3453 3189 2316 1793 1455 1213

3118 2973 2213 1727 1426 1204

3077 2973 2225 1761 1434 1206

3324 3119 2278 1780 1461 1227

3309 3149 2329 1828 1483 1239

3156 3068 2283 1767 1460 1222

264.80 8.39% 206.20 6.72% 139.40 6.11% 111.40 6.30% 81.80 5.60% 70.20 5.74%

126

Number of words meeting criteria in different samples Size of sample compared Min Freq Required 200min Sample 1 681 Sample 2 690 Sample 3 660 Sample 4 712 Sample 5 693 Whole Corpus 689

Average of words not meeting criteria in both sample and whole corpus Only in Only in whole Total Sample corpus difference SD 39.20 41.00 80.20 7.19 5.69% 5.95% 138.60 4.39% 147.80 4.82% 121.40 5.32% 81.40 4.61% 71.20 4.88% 55.60 4.55% 30.80 4.47% 11.64% 358.00 11.34% 328.00 10.69% 241.80 10.59% 169.60 9.60% 136.80 9.37% 110.60 9.05% 60.80 8.82% 1.04% 49.81 1.58% 26.35 0.86% 9.73 0.43% 10.92 0.62% 5.76 0.39% 6.66 0.54% 8.07 0.48%

5 vs. 10

none 20min 40min 60min 80min 100min 200min

3372 3192 2314 1793 1458 1231 688

3046 2957 2254 1747 1441 1221 682

3019 2954 2240 1740 1439 1213 680

3282 3165 2315 1794 1461 1222 704

3465 3234 2287 1795 1473 1220 687

3156 3068 2283 1767 1460 1222 689

219.40 6.95% 180.20 5.87% 120.40 5.27% 88.20 4.99% 65.60 4.49% 55.00 4.50% 30.00 4.35%

127

If decreasing the chapter range requirement does not increase the ability to capture important words, how about a manipulation in the opposite direction: increasing the chapter range requirement? One might assume that this adjustment would increase reliability, as a word with wider range in the whole corpus is likely more important, and thus would maintain a wide range even in samples. To assess whether this assumption might prove correct, I adjusted the chapter requirement to 75% (or occurrence in at least 118 of the 157 chapters in the corpus). The results of this experiment can be seen in Table 17. As can be seen in this table, just as with decreasing the minimum chapter range requirement to 25%, increasing the chapter range requirements to 75% provided no notable increase in my ability to reliably capture important words. In fact, what is notable is the close similarity in the reliability of word lists produced (i.e., the total percentage difference between sample word lists and the whole corpus word lists), regardless of whether the minimum chapter range requirement be 25%, 50%, or 75%. The only notable result of altering the chapter range requirement is in the size of the lists produced (i.e., the laxer the requirement, the larger the list; the stricter the requirement, the smaller the list). With regard to increasing reliability of the word lists, the only variable that appears to consistently affect reliability is sample size: the larger the sample, the better the reliability; the smaller the sample, the lower the reliability.

128

Table 17 Comparison of Lists Produced by Different Samples with Lists Produced by Whole Corpus (75% chapter range requirement)
Number of words meeting criteria in different samples Size of sample compared 2 vs. 10 Min Freq Required none 20min 40min 60min 80min 100min 200min Sample 1 809 809 809 809 807 788 552 Sample 2 989 989 989 982 961 902 614 Sample 3 886 886 886 886 884 847 612 Sample 4 631 631 631 631 631 629 501 Sample 5 795 795 795 795 792 777 555 Whole Corpus 825 825 825 825 825 806 591 Average of words not meeting criteria in both sample and whole corpus Only in Only in whole Total Sample corpus difference SD 114.25 69.50 183.75 33.59 13.85% 114.25 13.85% 114.25 13.85% 112.75 13.67% 108.50 13.15% 96.75 11.73% 58.00 9.81% 3vs10 none 20min 40min 835 835 835 979 979 979 808 808 808 820 820 820 968 968 968 825 825 825 121.50 14.73% 121.50 14.73% 121.50 8.42% 69.50 8.42% 69.50 8.42% 69.75 8.45% 72.50 8.79% 74.25 9.00% 65.75 11.13% 49.00 5.94% 49.00 5.94% 49.00 22.27% 183.75 22.27% 183.75 22.27% 182.50 22.12% 181.00 21.94% 171.00 20.73% 123.75 20.94% 170.50 20.67% 170.50 20.67% 170.50 4.07% 33.59 4.07% 33.59 4.07% 32.92 3.99% 32.55 3.95% 30.47 3.69% 15.45 2.61% 41.06 4.98% 41.06 4.98% 41.06

129

Number of words meeting criteria in different samples Size of sample compared Min Freq Required 60min 80min 100min 200min Sample 1 835 832 814 595 Sample 2 979 961 911 620 Sample 3 808 806 786 587 Sample 4 820 818 804 593 Sample 5 966 947 889 610 Whole Corpus 825 825 806 591

Average of words not meeting criteria in both sample and whole corpus Only in Only in whole Total Sample corpus difference SD 14.73% 5.94% 20.67% 4.98% 121.00 14.67% 112.75 13.67% 97.25 12.07% 56.75 9.60% 49.00 5.94% 51.25 6.21% 53.25 6.61% 44.75 7.57% 34.75 4.21% 34.75 4.21% 34.75 4.21% 34.75 4.21% 35.75 4.33% 36.75 4.56% 35.50 6.01% 170.00 20.61% 164.00 19.88% 150.50 18.67% 101.50 17.17% 113.75 13.79% 113.75 13.79% 113.75 13.79% 113.50 13.76% 112.25 13.61% 101.50 12.59% 74.50 12.61% 40.61 4.92% 35.27 4.28% 23.17 2.87% 8.20 1.39% 20.38 2.47% 20.38 2.47% 20.38 2.47% 20.10 2.44% 19.32 2.34% 10.16 1.26% 6.76 1.14%

4vs10

none 20min 40min 60min 80min 100min 200min

941 941 941 940 932 884 611

890 890 890 890 887 847 604

836 836 836 836 835 811 575

918 918 918 918 912 865 625

810 810 810 810 809 794 588

825 825 825 825 825 806 591

79.00 9.58% 79.00 9.58% 79.00 9.58% 78.75 9.55% 76.50 9.27% 64.75 8.03% 39.00 6.60%

130

Number of words meeting criteria in different samples Size of sample compared 5vs10 Min Freq Required none 20min 40min 60min 80min 100min 200min Sample 1 893 893 893 893 889 855 604 Sample 2 796 796 796 796 796 779 574 Sample 3 838 838 838 838 836 816 584 Sample 4 849 849 849 849 847 825 610 Sample 5 934 934 934 934 926 877 608 Whole Corpus 825 825 825 825 825 806 591

Average of words not meeting criteria in both sample and whole corpus Only in Only in whole Total Sample corpus difference SD 70.75 30.50 101.25 21.59 8.58% 70.75 8.58% 70.75 8.58% 70.75 8.58% 67.50 8.18% 57.00 7.07% 30.50 5.16% 3.70% 30.50 3.70% 30.50 3.70% 30.50 3.70% 30.75 3.73% 31.25 3.88% 29.00 4.91% 12.27% 101.25 12.27% 101.25 12.27% 101.25 12.27% 98.25 11.91% 88.25 10.95% 59.50 10.07% 2.62% 21.59 2.62% 21.59 2.62% 21.59 2.62% 17.76 2.15% 11.54 1.43% 8.90 1.51%

131

5.4

Chapter 5 Conclusions All of the experiments in Chapter 5 sought to determine the degree to which

different size samples (i.e., different sampling rates) with different word list inclusion criteria could represent the lexical distributions (i.e., capture the important words identified) in the PSYTB corpus. Based on the findings from these experiments, we can reasonably conclude that there is a tremendous amount of lexical variability in introductory psychology books, and that it takes a very large corpus to reasonably capture this variability. Indeed, a sample of nine textbooksa 90% sampling rate!cannot even reliably represent the lexical distributions (i.e., reliably capture the important words) in 10 textbooks in the PSYTB corpus. Adding a 10th textbook consistently alters the list of important words by 3-3.5%.

132

Chapter 6 General Discussion of Findings

This chapter summarizes the major findings from this dissertation (6.1) as well as the important implications for corpus-based vocabulary research and for those who rely on this research for practical application (6.2). The chapter then concludes with acknowledgement of important limitations of the study (6.3) and possible directions for future research (6.4). 6.1 Summary and Discussion of Major Findings All findings in the current dissertation can be distilled into one simple yet critically important conclusion: there is likely far greater lexical variability in target language use domains than previous corpus-based vocabulary studies have accounted for. This variability has been demonstrated in the current study through the attempt to capture the total lexical diversity (Chapter 4) as well as by the attempt to identify a stable, reliable set of important words (Chapter 5) in introductory psychology textbooks. Following is a summary of major findings related to each of this dissertations two major goals. These summaries include discussion which contextualizes findings in relation to previous corpus-based studies of vocabulary that have sought to identify the lexical challenged posed by different target use domains and to identify lists of important words to help focus instructional and learning efforts. 6.1.1 Capturing the lexical diversity in a target use domain The first goal of this dissertation was to provide an account of the lexical diversity in a restricted register (i.e., introductory textbooks) in one academic discipline (i.e., psychology). Findings presented in Chapter 4 strongly suggest that the domain of
133

introductory psychology textbooks is far more lexically diverse than I have captured with the 3.1 million-word PSYTB corpus. Approximately 3-6% of the total lexical diversity in PSYTB corpus is not accounted for until the final textbook was added to the corpus. There is no evidence to conclude that a 10th book provides lexical closure (i.e., saturation) in this corpus. This tremendous lexical diversity in this very restricted, somewhat conventionalized domain is surprising. As a result of this unexpected lexical diversity, it has become clear that it would be premature to propose any estimates of the lexical challenge that learners may encounter in required introductory psychology course readings. That is, any such estimates would be based on an incomplete understanding of the lexical diversity in this domain. Perhaps more importantly, these findings pose an important methodological problem for corpus-based vocabulary researchers who have proposed estimates of the lexical challenge posed by different use domains. Unfortunately, there is little research regarding the degree to which previously used corpora represent the lexical diversity in their target domains. The only study that I have found that directly investigates this issue is McEnery and Wilsons (1996) brief case study of lexical closure in a corpus of IBM users manuals. With regard to general English corpora, Evert and Baroni (2006) demonstrate that the BNC has captured tremendous deal of lexical diversity in contemporary English. Still, Evert & Baronis lexical growth curve demonstrates that there is still a great deal of lexical diversity that is not being captured by the BNC, even with 75,000,000 words. This can be observed in the fact that there is lexical growth exceeding 10% contributed by the last 25,000,000 words of the BNC. Thus, to claim that even this massive corpus represents the lexical

134

diversity in contemporary English is an overstatement. With regard to academic corpora, the degree of lexical diversity captured has simply not been assessed. Because the assessment of lexical diversity representativeness has not been included in corpus-based studies of lexical challenge, there is reason to question estimates proposed in How many words are needed to? studies. How can we know how many words learners need in order to accomplish certain tasks if we have not accounted for all of the words that they may encounter during these tasks? This dissertation is among the first studies to question and empirically assess this issue, and it has demonstrated that even a seemingly narrow, restricted domain has lexical diversity beyond what might be anticipated. There is no reason to suspect that the same, unaccounted-for diversity does not also exist in other target domains that researchers have attempted to describe. 6.1.2 Identifying a stable, reliable list of important words from a target use domain This dissertation is also among the first to assess the degree to which different size corpora are able to produce stable, reliable word lists that reflect the lexical distributions in a target domain. Findings detailed in Chapter 5 demonstrate a problem: even a set of 9 complete introductory psychology textbooks produced lists that are still, on average, approximately 3.5% different from lists produced from a corpus comprised of just one additional textbook. Considered a different way, even with a corpus representing a 90% sampling rate (i.e., 9 out of 10 books), I was unable to produce a stable, reliable list of important words. In previous corpus-based word list research, direct assessments of word list reliability have not been conducted. Instead, the reliability of word lists has just been

135

assumed, because those lists were identified from corpora that were considered to be large and representative of the types of texts that occur in a given domain. Regarding the validity of corpora that have been used for vocabulary research, corpus compilers and word list designers have spent a great deal of effort detailing corpus-external evidence of corpus representativeness (e.g., topic and/or register coverage). This evidence, it appears, has been considered sufficient support for conclusions that have been drawn based on corpora (e.g., lists of important words). Regarding the word lists themselves, a good deal of evidence has been put forth in order to demonstrate their validity. For example, distributional characteristics, namely range, frequency, and dispersion, have been used to demonstrate the importance of word lists. That is, if words appear frequently, widely, and evenly throughout a corpus, there is evidence of their usefulness and, thus, justification for their inclusion on lists. Post-hoc analysis has also been used to validate word lists. For example, Coxhead (2000) determined that her AWL provided a great deal more coverage of academic texts (i.e., 10% on average) than it did of more general English texts, and thus concluded that the AWL indeed represented important academic vocabulary. Nation (2006) suggested that his 1,000-word bands from the BNC were accurately ordered because the first 1,000 words together provided higher coverage of the BNC and other general English corpora than did the second 1,000 words. What neither of these studies, or others like them, have done, however, is to assess the reliability of the words on these lists. For example, if we applied the same distributional requirements used for AWL inclusion to a different corpus of academic writing of comparable dimensions to Coxheads Academic Corpus, would the list of

136

words produced be the same as the AWL? If we applied the same distributional requirements used to produce the BNC lists to a different general English corpus of comparable dimensions to the BNC, would the first 1,000 words overlap with Nations BNC 1,000? Would the 2nd 1,000 overlap with the BNC 2,000? The answer to these questions is that we really do not know because the reliability of these word lists has not been directly assessed. Yet word list users continue to put their faith in these lists as being worthy of instructional and learning focus. This dissertation is among the first to directly assess word list reliability, and it has found that, at least in the domain of introductory psychology textbooks, even with a corpus representing a 90% sampling rate, word lists are neither stable nor reliable. This finding leads to two possible conclusions. With regard to the current study, I might conclude that a sample of 10 introductory psychology textbooks is simply too small to represent the target domain. With additional textbooks, perhaps I would see stronger evidence that my corpus is representative (i.e., greater stability of the word lists). However, the experiments that I have conducted in this dissertation provided no evidence that this would be the case. A second possibility is that there is just so much variability in academic writing that any list of important words in a domain, even one as narrowly defined as introductory psychology textbooks, is far more restricted in size and/or reliability than we have considered previously. Either way, the findings from this dissertation have important implications with regard to decisions that are made based on corpus-based vocabulary research. As mentioned above, this dissertation has demonstrated that the PSYTB corpus is not perfectly representative of the target domain. If 10 complete textbooks and over 3.1

137

million words are not enough to represent lexical variability such a narrow domain, to what extent are often smaller corpora able to represent lexical variability in broader domains? Results from the current study offer no evidence that they do. In turn, questions must be raised regarding the extent to which word lists based on these corpora are stable and reliable. Findings from the current study offer no evidence that they are. So what does this mean for corpus-based vocabulary researchers and for those who rely on the conclusions these researchers draw? Implications are discussed in the following section.

6.2

Implications of Findings Findings from this dissertation lead to important implications both for corpus-

based vocabulary researchers (6.2.1) and for those who hope to use research findings to focus instructional and learning goals (6.2.2). 6.2.1 Implications for corpus-based vocabulary research

1. Corpora may have to be far larger than has been the norm to represent lexical variability in a target domain. On the surface, we can compare the size of corpora used in my study (3.1 million words) with the size of corpora used to represent other academic domains of various scope. As noted previously, Coxhead (2000) represented the broad domain of academic English with a corpus of 3.5 million wordsonly slightly larger than the PSYTB corpus. Within Coxheads Academic Corpus, she represented the discipline of psychology with approximately 125,000 words. Regarding other specialized corpora, Martinez et al. (2009) represented their domain, academic research articles in agriculture, with fewer

138

than 1,000,000 words, and Chen & Ge (2007) represent medical research articles with fewer than 200,000 words. Based on corpus size (i.e., total running words), then, it would appear that the PSYTB corpus should be at least as representative of its target domain as other academic corpora have been of theirs. Yet this dissertation has demonstrated empirically that the PSYTB corpus does not represent the lexical variability in its target domain. Thus, a critical question is raised: Have previously used corpora been sufficiently representative of their target domains? Unfortunately, this issue has simply not been addressed in previous research. Findings from this dissertation suggest the likelihood that corpora used for vocabulary research have in fact not been adequately representative of the lexical variability in their target domain. 2. Size is not the whole story. In addition to considering corpus size, however, researchers must consider the extent to which corpora reliably represent the lexical distributions in their target domain. Brysbaert & New (2009) note that, with the increasing ease of compiling electronic corpora, corpora in the hundreds of million, or even billion running words will become increasingly common. Compilers of these new corpora note how the larger size and increased consideration of topic or register coverage and balance are evidence of their corporas increased representativeness, and, in turn, the validity of conclusions drawn from them (e.g., BNC: Leech, Rayson, & Wilson, 2001; Corpus of Contemporary American English: Davies, 2009; 2011; Davies & Gardner, 2010). While these corpora may indeed be more representative of lexical distributions than their smaller, sometimes less-principled predecessors were, evidence to this effect has simply not been produced. As well, no evidence has been produced to demonstrate the reliability of conclusions

139

drawn from these corpora (e.g., estimates of lexical challenge and/or proposed word lists). This is surprising when we consider the painstaking effort and care that has gone into designing and compiling these corpora and creating lists from them. 6.2.2 Implications for those who rely on corpus-based vocabulary research

1. Word list users must understand what word lists are and what they are not. Important word lists are simply lists of words meeting predetermined distributional characteristics in the corpora upon which they were based. And as Schmitt (2010) noted, a word list is only as good as the corpus upon which it is based. Thus, there is a limit to the generalizability of word lists to other texts, even to other texts within the same domain. It is crucial, therefore, that list users understand this and exercise caution in applying word lists to their given context. For example, while the AWL consists of words that meet Coxheads (2000) pre-determined distributional characteristics within her Academic Corpus, the AWL is not necessarily reliable across all academic texts. This lack of generalizability has been demonstrated on numerous occasions and with domains of various scope (e.g., Hyland & Tse, 2007; Martnez et al., 2009: Vongpumivitch et al., 2009; Wang et al., 2008). This dissertation has further highlighted and extended this concern by demonstrating the challenge of identifying a stable, reliable list of important words that is generalizable even across the narrow domain of introductory psychology textbooks. This is not to say that there does not exist sets of words that are generalizable within a domain or even across domains. However, as demonstrated in the current study, it is likely that such lists are much more restricted either in terms of size or in terms of reliabilitythan has been realized or acknowledged.

140

2. Vocabulary learning goals should take into account the limitations of important word lists. Because of the demonstrated limitations on word list generalizability, language programs should carefully consider the words that are targeted for instruction. In an English for Specific Academic Purposes (ESAP) context, where learners share a target use domain, word lists might be designed from specialized corpora representing texts from the target domain, without regard to previously proposed word lists such as the GSL and AWL (e.g., Ward, 1999; 2007; Mudraya, 2006). With this approach, list designers would not adopt any shortcomings of previously proposed lists, such as notable omissions or unusual inclusions (Nation & Webb, 2010, p. 135), that exist because the lists may not have been designed specifically for a given target domain. There is also evidence suggesting that this method can be effective in a general EAP context where a Sustained Content Language Teaching (SCLT) approach is used (Donley & Reppen, 2001; Murphy & Stoller, 2001). Donley and Reppen, for example, demonstrated that explicit focus on word lists identified by a corpus-based analysis of even a small set of texts from a target domain can lead to development of both disciplinary vocabulary and more general academic vocabulary. In either context, EAP or ESAP, teachers must recognize the limits of word lists. Simply stated, it is difficult to predict a large proportion of words that students will encounter in required readings at the university. As a result, the word coverage provided by a reliable list of important words is likely to be quite limited. Therefore, students must be armed with strategies for independent vocabulary learning. In addition, learners must be provided substantial opportunities for extensive reading to develop these

141

vocabulary-learning strategies and to develop their vocabulary knowledge through multiple, meaningful encounters with vocabulary.

6.3

Acknowledgement of Major Limitations In assessing the value of the conclusions outlined above, there are a number of

limitations that must be acknowledged. Two notable limitations are discussed below. 6.3.1 Generalizability of findings It must be acknowledged that this case study was focused on one, narrow target use domain. I have not directly assessed the representativeness of previously used corpora or the reliability of previously proposed word lists. Thus, I am not able to say with certainty whether either the corpora or the proposed word lists have the same limitations that I found in this dissertation study. That said, however, there is no reason to suspect that they do not. On the contrary, findings from the current study strongly suggest that, in fact, they might. 6.3.2 The unit of analysis By using the lemma as opposed to the word family as the unit of analysis, I may have introduced an appearance of unrepresentativeness and/or degree of word list instability that might not be present had the unit of analysis been the word family. For example, consider the AWL word family benefit and all of its word family members (below). All word forms in this word family would be considered instances of one lexical item: benefit.

1. benefit beneficial beneficiary


142

beneficiaries benefited benefiting benefit benefits In the current study, based on the lemma, the word forms making up this word family were categorized as four separate lexical items: 1. 2. 3. 4. BENEFIT (n.): benefit, benefits BENEFIT (v.): benefit, benefits, benefiting, benefited BENEFICIAL (adj): beneficial BENFICIARY (n.): beneficiary, beneficiaries

This difference may have had two possible consequences. First, with regard to lexical diversity, there are simply more lemmas than there are word families. In the example above, there are four lemmas to one word family. Thus, using the lemma as the unit of analysis gives the appearance of greater lexical diversity than would using the word family. As well, achieving closure with word families would likely happen more quickly than it would with lemmas. For example, once any member of the benefit word family occurred, no subsequent occurrences of any member of this family would be considered as adding to the diversity. In essence, closure of the benefit family would occur at the first instance of any member of the benefit word family. In contrast, with lemmas, closure would not occur until each of the four BENEFIT lemmas had occurred. For example, if BENEFIT occurred as a noun, a subsequent occurrence of BENEFIT as a verb would be considered a growth in lexical diversity. The degree of list stability might have also been affected by the using the lemma, rather than the word family, as the unit of analysis. Because of its larger number of members, a word family has a greater possibility of occurring in any given text than does
143

each of the four component lemmas making up this word family. Thus, the combined range and frequency of all members of a word family are certainly going to be greater than the range and frequency of individual, component lemmas. Thus, in the present study, based on lemmas, I might have found that, for example, the lemma BENIFICIAL (adj) added instability to my word lists, as it achieved the 50% chapter range identifying it as important only in some samples. However, if the unit of analysis had been the word family, there might have been a greater chance that at least one member of the benefit word family occurred in 50% of the chapters. Thus, while this dissertation highlights the tremendous lexical diversity in academic texts, and the related challenge of identifying a reliable list of important vocabulary, evaluation of these findings must take into account the unit of analysis used in this dissertation. It is possible that previously proposed estimates of lexical challenge and lists of important words based on the word family do not suffer the degree of unreliability that I have found in my study. However, we cannot know the degree of reliability of any estimates or word lists until it is empirically assesseda step that is rarely if at all taken in the process of word list development.

6.4

Directions for Future Related Research This dissertation has raised critical methodological issues related to conclusions

that can be drawn from corpus-based vocabulary research. Specifically, it has demonstrated that even a seemingly large corpus was unable to represent the lexical diversity in, or produce a stable reliable list of important words for, a restricted domain: introductory psychology textbooks. These findings raise the real possibility that the lexical challenge of different language use tasks may be even greater than has been
144

previously proposed, and that lists of important words may less reliable than has been assumed. However, these limitations remain only possibilities until they are empirically assessed. Future research can replicate this study with other corpora and word lists representing different domains to assess representativeness and reliability. An additional direction for related future research might address the larger question of academic corpus design for vocabulary research. The design of most academic corpora has been based on the assumption that balanced representation should be operationalized as a collection of equal size subcorpora representing all relevant external criteria (e.g., X number of words or texts from biology, the same number of texts or words from chemistry, and the same number from physics). Variability in lexical diversity, however, is not necessarily balanced in this way. We have no evidence that internal lexical variability is constant between different disciplines. However, we do have some evidence to the contrary. With regard to lexical diversity, Biber (2006), for example, demonstrated that different disciplines make use of a larger or smaller set of words than others do. Thus, we may need corpora of different sizes to represent the lexical diversity in different disciplines. Additionally, different disciplines might require different size corpora to reliably represent important words in their domain. For example, the current dissertation demonstrates that a corpus larger than 10 complete textbooks is required to represent lexical variability in introductory psychology textbooks. It may be that feweror moretextbooks are required to represent the lexical variability in, for example, cultural anthropology or geology. Thus, operationalizing balance as subcorpora of equal dimensions may be misguided. Future research can investigate the dimensions of disciplinary corpora required to represent

145

lexical distributions in their target domain, and account for any differences in the design of macro-disciplinary or general academic English corpora.

146

Chapter 7 Conclusion

This dissertation has had two primary goals: to assess the vocabulary challenge English learners might face in required reading in their introductory psychology classes, and to assess the size and composition of the corpus required to reliably capture the important words in these readings. Related to the first goal, I have demonstrated that the PSYTB corpus does not represent the full range of lexical diversity that students may encounter in introductory psychology reading. As a result, this study has not produced evidence allowing me to confidently propose an estimate of the number of words that will allow students to achieve adequate coverage of a psychology textbook that they might be assigned to read. What this dissertation has done, however, is highlight an important methodological problem involved in providing estimates of the lexical challenge posed by target use domains: corpus-based estimates of lexical challenge are only as meaningful as the corpora are representative of lexical diversity in a target domain. Related to the second goal, I have demonstrated the significant challenge of producing a stable, reliable list of important words for this domain. Even a 3.1 millionword corpus of 10 whole introductory psychology textbooks was unable to yield such a list. Further, no findings from this dissertation provide evidence that even a larger sampleeven doubling, tripling, or more the number of textbooks in the corpuswould do so. This finding brings into question whether a corpus of any size or composition

147

would allow me to produce a stable, reliable set of important words for introductory psychology textbooks. These findings from this dissertation have much broader implications for corpusbased vocabulary research that seeks either to understand the lexical challenge faced by learners or to identify lists of words meriting valuable teaching and learning time. That is, findings highlight fundamental methodological issues that must be addressed. Specifically, this case study has provided support for Bibers (1993) contention that corpus-internal variables must be included in operationalizing corpus representativeness. Because such variables have yet to be considered in corpus-based vocabulary research, potential limitations of previously proposed estimates and word lists existlimitations of which teachers and learners must be aware. Until vocabulary researchers include the additional step of corpus-internal assessment of their corporas lexical representativeness, we must be extremely cautious about any conclusions we can draw from these corpora regarding the size of vocabulary required or the actual words worthy of focus for a particular target language use.

148

REFERENCES Adelson-Goldstein, J. & Shapiro, N. (2008). Oxford picture dictionary 2nd ed. New York: Oxford. Adolphs, S. & Schmitt, N. (2003). Lexical coverage of spoken discourse. Applied Linguistics, 24(4), 425-438. Alderson, J. C. (2007). Judging the frequency of English words. Applied Linguistics, 28(3), 383-409. Anderson, N. (2008). Practical English teaching: Reading. NY: Mcgraw Hill. Atkins, S., Clear, J., & Ostler, N. (1992). Corpus design criteria. Literary and Linguistic Computing, 7, 1-16. Baayen, H. (2001). Word frequency distributions. New York: Kluwer Academic. Baker, M. (1988). Sub-technical vocabulary and the ESP teacher: An analysis of some rhetorical items in medical journal articles. Reading in a Foreign Language, 4(2), 91-105. Bauer, L., & Nation, I. S. P. (1993). Word families. International Journal of Lexicography, 6, 253-279. Becka, J. (1972). The lexical composition of specialized texts and its quantitative aspect. Prague Studies in Mathematical Linguistics, 4, 4764. Belica, C. (1996). Analysis of temporal change in corpora. International Journal of Corpus Linguistics, 1(1), 61-74. Biber, D. (1988). Variation across speech and writing. Cambridge: Cambridge University Press.

149

Biber, D. (1993). Representativeness in corpus design. Literary & Linguistic Computing, 8, 243-257. Biber, D. (2006). University language: A corpus-based study of spoken and written registers. Herndon, VA: John Benjamins. Biber, D., Conrad, S., & Cortes, V. (2004). If you look at...: Lexical bundles in university teaching and textbooks. Applied Linguistics, 25, 371-405. Biber, D., Conrad, S.,. & Reppen, R. (1998). Corpus linguistics: Investigating structure And use. Cambridge, UK: Cambridge University Press. Biber, D., Conrad, S., Reppen, R., Byrd, P., Helt, M., Clark, V., Cortes, V., Csomay, C. and Urzua, A. (2004 ). Representing Language Use in the University: Analysis of the TOEFL 2000 Spoken and Written Academic Language Corpus. (ETS TOEFL Monograph Series, MS-25). Princeton, NJ: Educational Testing Service. Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman Grammar of Spoken and Written English. New York: Longman. Bowker, L. & Pearson, J. (2002). Working with specialized language: a practical guide to using corpora. London, UK: Routlege. Bramki, D., & Williams, R. (1984). Lexical familiarization in economics text, and its pedagogic implications in reading comprehension. Reading in a Foreign Language, 2, 169181.

150

Brysbaert, M. & New, B. (2009). Moving beyond Kucera and Francis: A Critical Evaluation of Current Word Frequency Norms and the Introduction of a New and Improved Word Frequency Measure for American English. Behavior Research Methods, 41 (4), 977-990. Burgmeier, A. & Zimmerman, C. (2007). Inside reading 1 student book pack: The Academic Word List in context. Oxford, UK: Oxford University Press. Campion, M., & Elley, W. (1971). An academic vocabulary list. Wellington, New Zealand: Council for Educational Research. Cain, K. (2007). Syntactic awareness and reading ability: Is there any evidence for a special relationship? Applied Psycholinguistics, 29, 679-94. Carkin, S. (2001). Pedagogic discourse in introductory classes: Multidimensional analysis of textbooks and lectures in macroeconomics and biology. Unpublished doctoral dissertation. Northern Arizona University, Flagstaff. Carkin, S. (2005). English for academic purposes. In E. Hinkel (Ed.), Handbook of research in second language teaching and learning (pp. 85-98). Mahwah, NJ: Lawrence Erlbaum. Carnegie Foundation Commission on Higher Education. (2009, October 12). The Carnegie Classification of Institutions of Higher Education Retrieved from http://classifications.carnegiefoundation.org/ Carroll, J.B., Davies, P., & Richman, B. (1971). Word frequency book. New York: American Heritage. Carson, J. G., Chase, N. D., Gibson, S. U., & Hargrove, M. F. (1992). Literacy demands of the undergraduate curriculum. Reading Research and Instruction, 31, 25-50.

151

Chen, Q., & Ge, G. (2007). A corpus-based lexical study on frequency and distribution of Coxheads AWL word families in medical research articles (RAs). English for Specific Purposes, 26, 502514. Chung, T.M. (2003). A corpus comparison approach for terminology extraction. Terminology, 9(2), 221-246. Chung, T. M. & Nation, I. S. P. (2004). Identifying technical vocabulary. System, 32(2), 251-263. Cobb, T. (2010).Learning about language and learners from computer programs. Reading in a Foreign Language, 22(1), 181-200. Cobb, T., & Horst, M. (2001). Reading academic English: Carrying learners across the lexical threshold. In John Flowerdew & Matthew Peacock (Eds.) The English for Academic Purposes Curriculum (pp. 315-329). Cambridge: Cambridge University Press. The College Board. (2010). CLEP Introductory Psychology: At a Glance. Retrieved from http://clep.collegeboard.org/clep-introductory-psychology-glance Conklin, K., & Schmitt, N. (2008). Formulaic sequences: Are they processed more quickly than nonformulaic language by native and nonnative speakers? Applied Linguistics, 29(1), 7289. Cowan, J. R.. (1974). Lexical and syntactic research for the design of EFL reading materials. TESOL Quarterly, 8(4), 389-399. Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34(2), 213-238.

152

Coxhead, A. & Hirsh, D. (2007). A pilot science word list for EAP. Revue Franaise de linguistique applique, XII(2), 65-78. Darrell, B. (1980). A faculty survey of undergraduate reading and writing needs. Peabody Journal of Education, 57(2), 85-93. Davies, M. & Gardner, D. (2010). A frequency dictionary of contemporary American English. New York: Routledge. Donley, M., Reppen, R. (2001). Using corpus tools to highlight academic vocabulary in SCLT. TESOL Journal, Autumn, 3-5. Durrant, P. (2009). Investigating the viability of a collocation list for students of English for academic purposes. English for Specific Purposes, 28, 157169. Durrant, P. & Doherty, A. (2010). Are high-frequency collocations psychologically real? Investigating the thesis of collocational priming. Corpus Linguistics and Linguistic Theory, 6(2), 125-155. Eldridge, J. (2008). No, there isnt an academic vocabulary, but A reader responds to K. Hyland and P. Tses Is there an academic vocabulary? TESOL Quarterly, 42(1), 109-113. Erman & Warren (2000). The idiom principle and the open choice principle. Text, 20(1), 2962. Evert, S. & Baroni , M. (2006). Testing the extrapolation quality of word frequency models. Proceedings of Corpus Linguistics 2005. Retrieved from http://www.corpus.bham.ac.uk/PCLC/

153

Farrell, P. (1990). Vocabulary in ESP: a lexical analysis of the English of electronics and a study of semitechnical vocabulary. CLCS Occasional Paper No. 25, Trinity College. Flowerdew, J. (1993). Concordancing as a tool in course design. System, 21(2), 231-244. Francis, W. & Kucera, H. (1982). Frequency analysis of English Usage: Lexicon and grammar. Boston: Houghton Mifflin Gardner, D. (2007). Validating the construct of word in applied corpus-based vocabulary research: A critical survey. Applied Linguistics, 28(2), 241-265. Ghadessy, P. (1979). Frequency counts, words lists, and materials preparation: A new approach. English Teaching Forum, 17, 24-27. Grabe, W. (1991). Current developments in reading in a second language. TESOL Quarterly, 25, 375-406. Grabe, W. (2004). Research on L2 reading instruction. Annual Review of Applied Linguistics, 24, 44-69. Grabe, W. (2009). Reading in a second language: Moving from theory to practice. New York: Cambridge University Press. Grabe, W., & Stoller, F. (2002). Teaching and researching reading. New York: Longman. Granger, S., & M. Paquot (2009). In search of a general academic vocabulary: A corpusDriven study. Paper presented at the international conference Options and Practices of L.S.A.P practitioners, 7-8 February 2009, University of Crete, Heraklion, Crete. Retrieved from http://www.uclouvain.be/cps/ucl/ doc/adri/documents/ In_search_of_a_general_academic_english.pdf

154

Grant, A. & Ginther, L. (1996). A review of the academic needs of native Englishspeaking college students in the United States. Princeton, NJ: Educational Testing Service. Griggs, R. A., Jackson, S. L., Christopher, A. N., & Marek, P. (1999). Introductory psychology textbooks: An objective analysis and update. Teaching of Psychology, 26, 182189. Hancioglu, N., Neufeld, S., & Eldridge, J. (2008). Through the looking glass and into the land of lexico-grammar. English for Specific Purposes, 27, 459-479. Hazenberg, S., & Hulstijn, H. (1996). Defining a minimal receptive second-language vocabulary for non-native university students: An empirical investigation. Applied Linguistics, 17, 145-163. Heatley, A. and Nation, P. (1994). Range. Victoria University of Wellington, NZ. [Computer program, available at http://www.vuw.ac.nz/lals/.] Hirsh, D., & Nation, I. S. P. (1992). What vocabulary size is needed to read unsimplified texts for pleasure? Reading in a Foreign Language, 8, 689696. Hoey, M. (2005). Lexical priming: A new theory of words and language. London: Routledge Hu, M., & Nation, I. S. P. (2000). Unknown vocabulary density and reading comprehension. Reading in a Foreign Language, 13, 403430. Huntley, H. (2005). Essential academic vocabulary: Mastering the complete academic word list. NY: Houghton Mifflin. Hwang, K. and Nation, I.S.P. (1995). Where would general service vocabulary stop and special purposes vocabulary begin? System 23, 1: 35-41.

155

Hyland, K., & Tse, P. (2007) Is there an academic vocabulary? TESOL Quarterly, 41(2), 235-253. Johansson, S. (1978). A computer archive of modern English texts. What? How? Why? When? Sprk og sprkundervisning, 11(4), 70-73 Johns, A. (1981). Necessary English: A faculty survey. TESOL Quarterly, 15(1), 51-57. Johns, A. (1997). Text, role, and context: Developing academic literacies. Cambridge, UK: Cambridge University Press. Kennedy, G. (1998). An introduction to corpus linguistics. London, UK: Longman. Laufer, B. (1989). What percentage of text lexis is essential for comprehension? In C. Lauren and M. Nordman (Eds). Special Language: From Humans Thinking To Thinking Machines (pp. 316-323). Bristol, UK: Multilingual Matters. Laufer, B. (1992). How much lexis is necessary for reading comprehension? In H. Bejoint and P. Arnaud (Eds.) Vocabulary and Applied Linguistics (pp. 126-132). NY: Macmillan. Laufer, B. (1997). What's in a word that makes it hard or easy? Intralexical factors affecting the difficulty of vocabulary acquisition. In M. McCarthy and N. Schmitt (Eds.) Vocabulary Description, Acquisition and Pedagogy (pp.140-155). Cambridge, UK: Cambridge University Press. Laufer, B. & Ravenhorst-Kalovski, G. (2010). Lexical threshold revisited: lexical text coverage, learners' vocabulary size and reading comprehension. Reading in a Foreign Language, 22, 15-30. Leech, G., Rayson, P., & Wilson, A. (2001). Word frequencies in written and spoken English: Based on the British National Corpus. London, UK: Pearson.

156

Li, Y., & Qian, D. D. (2010). Profiling Academic Word List (AWL) in a financial corpus. System, 38, 402-411. Lorge, I. (1949). Reading and readability. Teachers College Record, 51(2), 90-97. Lynn, R. W. (1973). Preparing word lists: A suggested method. RELC Journal, 4(1), 25-32. Martnez, I., Beck, S., & Panza, C. (2009). Academic vocabulary in agricultural research articles: A corpus-based study. English for Specific Purposes, 28(3), 183-198. McEnery, T., & Wilson, A. (1996). Corpus Linguistics. Edinburgh: Edinburgh University Press. McEnery, T., Xiao, R., & Tono, Y. (2006). Corpus-based language studies: An advanced resource book. New York: Routledge. Millar, N., & Budgell, B. (2008). The language of public health: A corpus-based analysis. Journal of Public Health, 16, 369-374. Ming-Tzu, K., & Nation, I.S.P. (2004). Word meaning in academic English: Homography in the Academic Word List. Applied Linguistics, 25(3), 291314. Mudraya, O. (2006). Engineering English: A lexical frequency instructional model. English for Specific Purposes, 25(2), 235256. Murphy, J., & Stoller, F. (2001). Sustained content language teaching: An emerging definition. TESOL Journal, Autumn, 3-5. Nagy, W. E., & Anderson, R. C. (1984). How many words are there in printed school English? Reading Research Quarterly, 19, 304-330.

157

Nagy, W., Anderson, R., Schommer, M., Scott, J. A., & Stallman, A. (1989). Morphological families in the internal lexicon. Reading Research Quarterly, 24, 262- 281. Nation, I. S. P. (2001). Learning vocabulary in another language. Cambridge: Cambridge University Press. Nation, I. S. P. (2004). A study of the most frequent word families in the British National Corpus. In P. Bogaards and B. Laufer (Eds.) Vocabulary in a second language (pp. 3-14). Philadelphia, PA: John Benjamins. Nation, I.S.P. (2006). How large a vocabulary is needed for reading and listening? Canadian Modern Language Review, 63(1), 59-82. Nation, I. S. P., & Coxhead, A. (2001). The specialised vocabulary of English for academic purposes. In J. Flowerdew & M. Peacock (Eds.), Research perspectives on English for academic purposes (pp. 252267). Cambridge: Cambridge University Press. Nation, I.S.P., & Waring, R. (1997). Vocabulary size, text coverage and word lists. In Schmitt, N. and M. McCarthy (Eds.): Vocabulary: Description, Acquisition and Pedagogy (pp. 6-19). Cambridge: Cambridge University Press. Nation, I.S.P., & Webb, S. (2011). Researching and Analyzing Vocabulary. Boston, MA: Heinle. Oh, J., Lee, J., Lee, K., Choi, K. (2000). Japanese term extraction using dictionary hierarchy and a machine translation system. Terminology, 6, 287311. Praninskas, J. (1972). American university word list. London: Longman.

158

Qian, D. (2002). Investigating the relationship between vocabulary knowledge and academic reading performance: An assessment perspective. Language Learning, 52, 513-536. Rosenfeld, M., Leung, S., & Oltman, P.K. (2001). Identifying the reading, writing, speaking, and listening tasks important for academic success at the undergraduate and graduate levels. [TOEFL Monograph Series MS-21]. Princeton, NJ: Educational Testing Service. Sheorey, R. & Mokhtari, K. (2001). Differences in the metacognitive awareness of reading strategies between native and nonnative readers. System, 29, 431-449. Simpson-Vlach, R. & Ellis, N. (2010) An academic formulas list: New methods in phraseology research. Applied Linguistics 31(4), 487-512. Schmitt, N. (2010). Researching Vocabulary: A Vocabulary Research Manual. NY: Palgrave Macmillan. Schmitt, N., & Schmitt, D. (2005). Focus on vocabulary: Mastering the Academic Word List. New York: Pearson. Schmitt, N., & Zimmerman, C. (2002). Derivative word forms: What do learners know? TESOL Quarterly 36(2), 145-171. Schmitt, N. Jiang, X. & Grabe, W. (2011). The percentage of words known in a text and reading comprehension. Modern Language Journal, 95(1), 26-43. Sheorey, R., & Mokhtari, K. (2001). Differences in the metacognitive awareness of reading strategies among native and non-native readers. System, 29, 431-449.

159

Shin, D. & Nation, P. (2008). Beyond single words: The most frequent collocations in spoken English, ELT Journal 62(4):339-348. Shiotsu, T., & Weir, C. J. (2007). The relative significance of syntactic knowledge and vocabulary breadth in the prediction of reading comprehension test performance. Language Testing, 24, 99-128. Sinclair, J. (1991). Corpus, concordance, collocation: Describing English language. Oxford: Oxford University Press. Sutarsyah, C., Nation, P., and Kennedy, G. (1994). How useful is EAP vocabulary for ESP? A corpus based case study. RELC Journal, 25(2): 34-50.
Thorndike, E . (1921). Word knowledge in the elementary school. Teachers College Record, 22 (4), 334-370.

Thorndike, E. & Lorge, I. (1944). The teacher's word book of 30,000 words. New York: Teachers College Press. Underwood, G., Schmitt, N., & Galpin, A. (2004). The eyes have it: An eye-movement study into the processing of formulaic sequences. In N. Schmitt (Ed.), Formulaic sequences: Acquisition, processing, and use (pp.153172). Philadelphia: John Benjamins. Upton, T. (2004). Reading Skills for Success: A Guide to Academic Texts. Ann Arbor, Michigan: The University of Michigan Press. Vongpumivitch, V., Huang., & Chang, Y. (2009). Frequency analysis of the words in the Academic Word List (AWL) and non-AL content words in applied linguistics research papers. English for Specific Purposes, 28(1), 33-41. Wang, J., Liang, S., & Ge, G. (2008). Establishment of a medical academic word list. English for Specific Purposes, 27(4), 442-458.
160

Ward, J. W. (1999). How large a vocabulary do EAP engineering students need? Reading in a Foreign Language,12(2), 309324. Webb, S., and Rodgers, M.P.H. (2009a). The lexical coverage of movies. Applied Linguistics, 30, 407-427. Webb, S., and Rodgers, M.P.H. (2009b). The vocabulary demands of television programs. Language Learning, 59, 335-366. West, M. (1953). A general service list of English words. London: Longman, Green and Co. Woodward-Kron, R. (2008). More than just jargon: The nature and role of specialist language in learning disciplinary knowledge. Journal of English for Academic Purposes, 7(4), 234-249. Xue, G., & Nation, I.S.P. (1984). A university word list. Language Learning and Communication, 3(2), 215-229. Yang, H. (1986) A new technique for identifying scientific/technical terms and describing science texts. Literary and Linguistic Computing, 1, 93103.

161

APPENDIX A Carnegie Classifications, Copied from: http://classifications.carnegiefoundation.org/descriptions/ugrad_program.php A&S-F/NGC: Arts & sciences focus, no graduate coexistence. According to the degree data, at least 80 percent of bachelors degree majors were in the arts and sciences, and no graduate degrees were awarded in fields corresponding to undergraduate majors. A&S-F/SGC: Arts & sciences focus, some graduate coexistence. At least 80 percent of bachelors degree majors were in the arts and sciences, and graduate degrees were observed in some of the fields corresponding to undergraduate majors (but less than half). A&S-F/HGC: Arts & sciences focus, high graduate coexistence. At least 80 percent of bachelors degree majors were in the arts and sciences, and graduate degrees were observed in at least half of the fields corresponding to undergraduate majors. A&S+Prof/NGC: Arts & sciences plus professions, no graduate coexistence. According to the degree data, 6079 percent of bachelors degree majors were in the arts and sciences, and no graduate degrees were awarded in fields corresponding to undergraduate majors. A&S+Prof/SGC: Arts & sciences plus professions, some graduate coexistence. 6079 percent of bachelors degree majors were in the arts and sciences, and graduate degrees were observed in some of the fields corresponding to undergraduate majors (but less than half). A&S+Prof/HGC: Arts & sciences plus professions, high graduate coexistence. 6079 percent of bachelors degree majors were in the arts and sciences, and graduate degrees were observed in at least half of the fields corresponding to undergraduate majors Bal/NGC: Balanced arts & sciences/professions, no graduate coexistence. According to the degree data, bachelors degree majors were relatively balanced between arts and sciences and professional fields (4159 percent in each), and no graduate degrees were awarded in fields corresponding to undergraduate majors. Bal/SGC: Balanced arts & sciences/professions, some graduate coexistence. Bachelors degree majors were relatively balanced between arts and sciences and professional fields (4159 percent in each), and graduate degrees were observed in some of the fields corresponding to undergraduate majors (but less than half). Bal/HGC: Balanced arts & sciences/professions, high graduate coexistence. Bachelors degree majors were relatively balanced between arts and sciences and professional fields (4159 percent in each), and graduate degrees were observed in at least half of the fields corresponding to undergraduate majors

162

Prof+A&S/NGC: Professions plus arts & sciences, no graduate coexistence. According to the degree data, 6079 percent of bachelors degree majors were in professional fields (such as business, education, engineering, health, and social work), and no graduate degrees were awarded in fields corresponding to undergraduate majors. Prof+A&S/SGC: Professions plus arts & sciences, some graduate coexistence. 6079 percent of bachelors degree majors were in professional fields, and graduate degrees were observed in some of the fields corresponding to undergraduate majors (but less than half). Prof+A&S/HGC: Professions plus arts & sciences, high graduate coexistence. 6079 percent of bachelors degree majors were in professional fields, and graduate degrees were observed in at least half of the fields corresponding to undergraduate majors. Prof-F/NGC: Professions focus, no graduate coexistence. According to the degree data, at least 80 percent of bachelors degree majors were in professional fields (such as business, education, engineering, health, and social work), and no graduate degrees were awarded in fields corresponding to undergraduate majors. Prof-F/SGC: Professions focus, some graduate coexistence. At least 80 percent of bachelors degree majors were in professional fields, and graduate degrees were observed in some of the fields corresponding to undergraduate majors (but less than half).

163

APPENDIX B Academic Institutions from which Textbooks were Selected Carnegie Classification A&S-F/NGC A&S-F/NGC A&S-F/SGC Institution Location Michigan Wisconsin North Carolina New York California Tennessee Illinois Nebraska California Maryland Washington D.C. Missouri Kentucky Wisconsin Indiana California Georgia Illinois Missouri Florida Indiana Arizona South Carolina Montana North Dakota Massachusetts Washington Michigan Public / Private Private Private Public Private Private Private Private Private Public Private Private Public Private Private Private Public Public Public Private Private Public West Public Public Public Private Private Public

Albion College Beloit College University of North Carolina, Asheville A&S-F/SGC Bard College A&S-F/HGC Stanford University A&S-F/HGC Vanderbilt University A&S+Prof/NGC Augustana College A&S+Prof/NGC Doane College A&S+Prof/SGC California State UniversityBakersfield A&S+Prof/SGC Hood College A&S+Prof/HGC Georgetown University A&S+Prof/HGC University of Missouri, Kansas City Bal/NGC Asbury College Bal/NGC Carthage College Bal/SGC Butler University Bal/SGC California State University, Chico Bal/HGC Georgia State University Bal/HGC Northern Illinois University Prof+A&S/NGC College of the Ozarks Prof+A&S/NGC Eckerd College Prof+A&S/SGC Ball State University Prof+A&S/SGC Northern Arizona University Prof+A&S/HGC Clemson University Prof+A&S/HGC Montana State University Prof-F/NGC Mayville State University Prof-F/NGC Becker College Prof-F/SGC City University Prof-F/SGC Ferris State University

164

Das könnte Ihnen auch gefallen