Beruflich Dokumente
Kultur Dokumente
Contents
1. Introduction 2. Corpus 3. Methodology 4. Sample list 5. Discussion
Introduction
Motivation Significance: collocations are essential for successful language processing and language use
Dilemma: collocations are instantly recognized by native speakers, but remain difficult for learners to acquire and use properly Proficiency: collocations make fluent language possible, support comprehension, create a register
Objective To compile a list of the most frequent and pedagogically relevant collocations in written academic English in order to devise efficient learning, teaching and assessment resources Tool Pearson International Corpus of Academic English (PICAE)
PICAE
WRITTEN
SPOKEN
Curricular
Curricular
Textbook
Article
Lecture
Seminar
Extracurricular
Extracurricular
TV broadcasts
Radio broadcasts
333
documents
From 4 academic disciplines: Humanities, Social Science, Natural and Formal Science, Professions and Applied Science Covering 28 academic subjects: 7 per academic discipline
Social Science Subject Anthropology Archaeology Tokens 413,237 184,089 861,656 520,395
Natural / Formal Science Subject Earth Sciences Chemistry Physics Computer sciences Tokens
1,562,046 Cultural studies 728,532 627,951 602,233 198,165 5,520,762 Gender studies Politics Psychology Sociology Total
1,124,097 Engineering 1,134,950 Health 1,090,800 Mathematics 295,565 sciences 1,429,679 Media 1,560,745 Biology 858,597 studies 1,500,485 1,832,588 Ecology 6,463,510 Total 239,787 6,026,100 Law Total 1,962,002 8,243,572
Methodology
Overview
Step One Douglas Bibers quantitative analysis of PICAE Result: list of c.130,000 collocations and extended phrases Step Two Manual vetting of initial collocation list Result: list of c.3,500 collocations Step Three Expert judgment Result: list of c.3,000 pedagogically relevant collocations
single word that tends to co-occur in the span of 3 words from the target word 1 per million words In 5 different texts Mutual Information score 3 T-score 2
Content words (nouns, verbs, adjectives, adverbs) Frequency of 5 per million words Distribution in 5 different texts Excluding General Service List headwords Using stop list to exclude frequent function words with purely grammatical meaning
Location relative to the target word (-3, -2, -1, +1, +2, +3) Normed overall frequency (per million words) Normed frequency (per million words) in each academic discipline Number of texts it occurs in Mutual Information score T-score
Most Normed Freq Freqper MI Position million score 1 1 2 1 2 1 1 1 1 1 1 2 8.12 8.12 8.08 8.08 8.08 8.08 8.08 8.08 8.03 8.03 8.03 8.03 8.03 7.99 7.99 6.39 5.93 4.82 7.17 3.72 5.52 3.21 6.52 6.00 6.87 3.99 4.96 14.45 5.45 4.61
#of tscore texts PAS 13.29 13.23 12.94 13.32 12.40 13.12 11.97 13.27 13.17 13.27 12.54 12.95 13.38 13.04 12.79 45 79 55 32 102 67 69 64 42 23 59 71 10 71 12
HM
SS 8.67 8.31 16.44 18.79 8.67 9.39 10.48 12.46 5.42 0.72 19.51 4.70 0.00 5.96 3.79
NFS 3.22 4.42 5.43 5.43 7.84 12.87 7.24 3.82 3.42 28.96 3.22 18.91 35.80 7.84 1.01
16.56 0.21 11.14 7.34 5.00 1.71 9.56 6.85 8.57 8.85 3.43 6.28 5.42 0.14 5.66 7.76 5.45 3.35 5.45 6.29 1.47 2.31 4.40 0.00
including information interact range community experimental data issues measured remote affect information society sensing
17.56 1.89
1 1 1
Tagging initial collocation list with OpenNLP using a simplified set of tags Converting tagged output into specified format Pre-C tag adj prep xxx adj Academic word concepts region areas culture community organization research slightly disposal obtain diversity independent AW tag n n n n n n n adv n v n adj Post-collocate Post-C tag
xxx n vpp
more
adv
variables
Filtering by POS tags to include only target combinations N+V N+Vpp Adj+N Adv+Adj Adv+V Re-tagging if applicable
Each collocation is to be judged by an expert committee on its pedagogical value. A decision is to be made independently by each expert whether or not to include the collocation in the final list. The list will be finalised in panel discussion.
2.
3.
Can help students from all academic disciplines to increase their collocational competence and thus their language proficiency Can assist EAP teachers in their lesson planning Will inform test development, i.e. item writing, item type, item analysis Will provide a research tool for investigating the development of academic language proficiency
10
Sample I: Frequency
Precollocate PreCtag Academicword social popular local wide local free social private public political general local academic public human low resources adj adj adj adj adj adj adj adj adj adj adj adj adj adj adj adj n policy culture authority range cultural previous authorities movement security sector policy indigenous economy widely ethnic assembly government academic productivity writing sphere genome income individual available
AWtag Postcollocate PostCtag n n n n adj adj n n n n n adj n adv adj n n adj nn n n n n adj adj
Status v f v v s d v v s v v s s v s v s s v s s f/s c c c
studies chapter
n n
n vpp n n n
differences
11
subsequent development v developmental further v professional v develop emotional v physical physical v professional human v subsequent economic v technological social v subsequent historical v technological agricultural v fully spiritual v highly industrial v originally urban technological regional significant
affect xxx development contribute xxx ensure xxx facilitate xxx promote xxx develop xxx developed xxx developing xxx developing xxx
c normal c future
23 An Academic Collocation List l 11/04/11
12
Discussion
Discussion
What about the collocations under discussion? Is there a preferred way of presenting the collocations? Are there any additional usages for the list? What to do with extended phrases?
1. 2. 3. 4. 5. 6. 7. 8. Some included, others rejected If POS is not a target combination Not a linguistically complete unit Degree of fixedness Transparent but useful discourse organisers/referential expressions Common adjectives as headwords Subject specific Implication/connotation
13
Thank you
kirsten.ackermann@pearson.com
14