Sie sind auf Seite 1von 14

An Academic Collocation List

Kirsten Ackermann BALEAP 2011, Portsmouth

Contents
1. Introduction 2. Corpus 3. Methodology 4. Sample list 5. Discussion

An Academic Collocation List l 11/04/11

Introduction

An Academic Collocation List l 11/04/11

Motivation Significance: collocations are essential for successful language processing and language use

Dilemma: collocations are instantly recognized by native speakers, but remain difficult for learners to acquire and use properly Proficiency: collocations make fluent language possible, support comprehension, create a register

Objective To compile a list of the most frequent and pedagogically relevant collocations in written academic English in order to devise efficient learning, teaching and assessment resources Tool Pearson International Corpus of Academic English (PICAE)

An Academic Collocation List l 11/04/11

Pearson International Corpus of Academic English (PICAE)

An Academic Collocation List l 11/04/11

PICAE

WRITTEN

SPOKEN

Curricular

Curricular

Textbook

Article

Lecture

Seminar

Extracurricular

Extracurricular

Administrative material University/student/ alumni magazines Employment and career information


7 An Academic Collocation List l 11/04/11

TV broadcasts

Radio broadcasts

PICAE: Written curricular component

333

documents

From 4 academic disciplines: Humanities, Social Science, Natural and Formal Science, Professions and Applied Science Covering 28 academic subjects: 7 per academic discipline

An Academic Collocation List l 11/04/11

PICAE: Academic disciplines and subjects


Humanities Subject History Linguistics Literature Art incl. Music General academia Philosophy Religion Total
9

Social Science Subject Anthropology Archaeology Tokens 413,237 184,089 861,656 520,395

Natural / Formal Science Subject Earth Sciences Chemistry Physics Computer sciences Tokens

Professions / Applied Science Subject Tokens

Tokens 946,707 855,128

1,343,723 Architecture 167,074 1,502,277 Business 662,054 Education 1,644,180 405,202

1,562,046 Cultural studies 728,532 627,951 602,233 198,165 5,520,762 Gender studies Politics Psychology Sociology Total

1,124,097 Engineering 1,134,950 Health 1,090,800 Mathematics 295,565 sciences 1,429,679 Media 1,560,745 Biology 858,597 studies 1,500,485 1,832,588 Ecology 6,463,510 Total 239,787 6,026,100 Law Total 1,962,002 8,243,572

An Academic Collocation List l 11/04/11

Methodology

10 An Academic Collocation List l 11/04/11

Overview
Step One Douglas Bibers quantitative analysis of PICAE Result: list of c.130,000 collocations and extended phrases Step Two Manual vetting of initial collocation list Result: list of c.3,500 collocations Step Three Expert judgment Result: list of c.3,000 pedagogically relevant collocations

11 An Academic Collocation List l 11/04/11

Stage One: Quantitative analysis of the corpus data Target output


Initial academic collocation list With collocation defined as:
A

single word that tends to co-occur in the span of 3 words from the target word 1 per million words In 5 different texts Mutual Information score 3 T-score 2

12 An Academic Collocation List l 11/04/11

Stage One: Quantitative analysis of the corpus data Computational method


A) Complete reference list Words appearing more than 12 times in the corpus B) Reference list of content words

Content words (nouns, verbs, adjectives, adverbs) Frequency of 5 per million words Distribution in 5 different texts Excluding General Service List headwords Using stop list to exclude frequent function words with purely grammatical meaning

13 An Academic Collocation List l 11/04/11

Stage One: Quantitative analysis of the corpus data Computational method


C) Initial list of c.130,000 academic collocations
Information for each collocate includes:

Location relative to the target word (-3, -2, -1, +1, +2, +3) Normed overall frequency (per million words) Normed frequency (per million words) in each academic discipline Number of texts it occurs in Mutual Information score T-score

14 An Academic Collocation List l 11/04/11

Pre collocate services

Academic Word provided rationale cultural gender

Post collocate for xxxpolitical identity

Most Normed Freq Freqper MI Position million score 1 1 2 1 2 1 1 1 1 1 1 2 8.12 8.12 8.08 8.08 8.08 8.08 8.08 8.08 8.03 8.03 8.03 8.03 8.03 7.99 7.99 6.39 5.93 4.82 7.17 3.72 5.52 3.21 6.52 6.00 6.87 3.99 4.96 14.45 5.45 4.61

#of tscore texts PAS 13.29 13.23 12.94 13.32 12.40 13.12 11.97 13.27 13.17 13.27 12.54 12.95 13.38 13.04 12.79 45 79 55 32 102 67 69 64 42 23 59 71 10 71 12

HM

SS 8.67 8.31 16.44 18.79 8.67 9.39 10.48 12.46 5.42 0.72 19.51 4.70 0.00 5.96 3.79

NFS 3.22 4.42 5.43 5.43 7.84 12.87 7.24 3.82 3.42 28.96 3.22 18.91 35.80 7.84 1.01

16.56 0.21 11.14 7.34 5.00 1.71 9.56 6.85 8.57 8.85 3.43 6.28 5.42 0.14 5.66 7.76 5.45 3.35 5.45 6.29 1.47 2.31 4.40 0.00

otherxxx provides to whole european social canxxx may

including information interact range community experimental data issues measured remote affect information society sensing

17.56 1.89

1 1 1

12.28 4.19 21.70 0.00

15 An Academic Collocation List l 11/04/11

Step Two: Manual vetting of initial collocation list


1. Preparation

Tagging initial collocation list with OpenNLP using a simplified set of tags Converting tagged output into specified format Pre-C tag adj prep xxx adj Academic word concepts region areas culture community organization research slightly disposal obtain diversity independent AW tag n n n n n n n adv n v n adj Post-collocate Post-C tag

Pre-collocate key from xxx different

xxx society based

xxx n vpp

social future at xxx order xxx cultural

adj adj prep xxx n xxx adj

more

adv

variables

16 An Academic Collocation List l 11/04/11

Step Two: Manual vetting of initial collocation list


2. Cleaning based on quantitative parameters Normed frequency 1 MI score 3 T-score 4 Dispersion .2 3. POS-analysis of the cleaned list

Filtering by POS tags to include only target combinations N+V N+Vpp Adj+N Adv+Adj Adv+V Re-tagging if applicable

17 An Academic Collocation List l 11/04/11

Step Two: Manual vetting


4. Qualitative analysis Judging each collocation whether to include discuss exclude in the academic collocation list.

18 An Academic Collocation List l 11/04/11

Step Three: Expert judgment


1.

Each collocation is to be judged by an expert committee on its pedagogical value. A decision is to be made independently by each expert whether or not to include the collocation in the final list. The list will be finalised in panel discussion.

2.

3.

19 An Academic Collocation List l 11/04/11

Result: An Academic Collocation List


The Academic Collocation List is a list of the most frequent and pedagogically relevant collocations in academic written English that:

Can help students from all academic disciplines to increase their collocational competence and thus their language proficiency Can assist EAP teachers in their lesson planning Will inform test development, i.e. item writing, item type, item analysis Will provide a research tool for investigating the development of academic language proficiency

20 An Academic Collocation List l 11/04/11

10

The Academic Collocation List A sample

21 An Academic Collocation List l 11/04/11

Sample I: Frequency

Precollocate PreCtag Academicword social popular local wide local free social private public political general local academic public human low resources adj adj adj adj adj adj adj adj adj adj adj adj adj adj adj adj n policy culture authority range cultural previous authorities movement security sector policy indigenous economy widely ethnic assembly government academic productivity writing sphere genome income individual available

AWtag Postcollocate PostCtag n n n n adj adj n n n n n adj n adv adj n n adj nn n n n n adj adj

Status v f v v s d v v s v v s s v s v s s v s s f/s c c c

studies chapter

n n

peoples used groups writing growth

n vpp n n n

differences

22 An Academic Collocation List l 11/04/11

11

Sample II: Collocation family development


Pre-collocate v v v s s s s s s s s s s s s c Academic word Academic word Postcollocate processes stage strategies development ? ? ? ? ? ? ? developments ? ? developed Pre-collocate Academic Word Postcollocate

subsequent development v developmental further v professional v develop emotional v physical physical v professional human v subsequent economic v technological social v subsequent historical v technological agricultural v fully spiritual v highly industrial v originally urban technological regional significant

affect xxx development contribute xxx ensure xxx facilitate xxx promote xxx develop xxx developed xxx developing xxx developing xxx

theory theory strategies theory

c normal c future
23 An Academic Collocation List l 11/04/11

Sample III: Collocation family research


adj adj adj adj adj adj adj adj adj adj adj adj adj adj adj adj adj p/adj p/adj n v v Pre-collocate AW considerable research initial earlier past original primary extensive little major basic current empirical previous future scientific further recent existing published field undertake conducting AW Post-collocate n research efforts n effort n purposes n methodology n evidence vpp/adj published vpp/adj undertaken vpp/adj conducted adj adj adj adj adj adj adj adj adj adj adj Pre-collocate traditional comparative educational experimental economic historical national medical academic quantitative qualitative AW research

24 An Academic Collocation List l 11/04/11

12

Discussion

25 An Academic Collocation List l 11/04/11

Discussion
What about the collocations under discussion? Is there a preferred way of presenting the collocations? Are there any additional usages for the list? What to do with extended phrases?
1. 2. 3. 4. 5. 6. 7. 8. Some included, others rejected If POS is not a target combination Not a linguistically complete unit Degree of fixedness Transparent but useful discourse organisers/referential expressions Common adjectives as headwords Subject specific Implication/connotation

26 An Academic Collocation List l 11/04/11

13

Thank you
kirsten.ackermann@pearson.com

14

Das könnte Ihnen auch gefallen