Sie sind auf Seite 1von 22

Collocation and chunking

Language is strongly patterned: many
words occur repeatedly in certain
lexicogrammatical patterns.
Psycholinguistic research language is
processed in chunks. The basic unit for
encoding and decoding may be the group,
set phrase, or collocation, rather than
ortographic word.
Collocation - definition
Collocation is the occurrence of two or
more words within a short space of each
other in a text. (J.M. Sinclair, Corpus,
Concordance, Collocation, OUP, 1991)
Collocation denotes frequently repeated or
statistically significant co-occurrences,
whether or not there are any special
semantic bonds between collocating
Collocation simple co-occurrence of
Anomalous collocation designates a
class of FEIs, with subtypes (ill-formed
collocation, cranberry collocation,
defective collocation, phraseological
Kinds of collocation
Collocations are the lexical evidence that words
do not combine randomly but follow rules,
principles, and real-world motivations. Different
kinds of collocation reflect different kinds of
Three kinds of collocation:
1. The simplest kind arises through semantics:
co-occurrence of co-members of semantic fields,
represenring co-occurrence of the referents in
the real world, e.g. word jam co-occurs with
other words from the lexical set food, such as
tarts, butty, doughnuts, marmalade, apricot,
Kinds of collocation
2. A second kind of collocation arises
where a word requires association with a
member of a certain class or category of
item, and such collocations are
constrained lexicogrammatically as well as
semantically, e.g. word rancid, adj. is
typically associated with butter, fat, and
foods containing butter or fat.
Kinds of collocation
In other cases, a word has a particular
meaning only when it is in collocation with
certain other words, e.g. face the
Also, selection restrictions on verbs may
specify certain kinds of subject or object,
e.g. the verb drink normally requires a
human subject and a liquid as object.

Kinds of collocation
3. A third kind of collocation is syntactic,
and arises where a verb, adjective, or
nominalization requires complementation
with, for example, a specified particle.
Such collocations are grammatically well
formed and highly frequent, but not
necessarily holistic and independent, e.g.
to be, one of, had been, you know, thank
you very much, are going to be, etc.
Two principles underlying language
The open choice principle
The idiom principle
These two principles are diametrically
opposed,and both are required in order to
account for language.
The open choice principle a way of seeing
language text as a result of a very large number
of complex choices. At each point where a unit is
completed (a word or a phrase or a clause) a
large range of choices opens up, and the only
restraint is grammaticalness.
Two principles underlying
The idiom principle a language user has
available to him a large number of semi-
prestructured phrases that constitute single
Thus at a point in text where the open choice
model would suggest a large range of possible
choices, the idiom principle restricts it over and
above predictable semantic restraints that result
from topic or situational context. A single choice
in one slot may be made which dictates which
elements will fill the next slot/s, and prevents the
use of free choice.

Two principles underlying language
Example: of course orthography and
the open choice model suggests that this
sequence comprises two different choices:
one at the of slot, and one at the course
the idiom principle suggests that it is a
single choice which coincidentally occupies
two word spaces.
The idiom principle
This principle is seen not only in fixed strings (e.g. of
course) but also in other kinds of phraseological unit,
e.g. greetings and social routines demonstrate the
idiom principle. Sociocultural rules of interaction restrict
choices within an exchange which may be realized in
fairly fixed formulations.
Sayings, similes, and proverbs also represent single
choices, even when they are truncated or manipulated,
and they may be prompted discoursally as stereotyped
responses, e.g. (every cloud has) a silver lining; no
news is good news these are predictable comments
on common experiences.
The idiom principle
There are also recurrent clauses and other
units that demonstrate the idiom principle,
e.g. from can I come in?, are you ready?
to its as easy as falling off a log.
Memorized clauses and clause
sequences form a high proportion of the
fluent stretches of speech heard in
everyday conversation.
Psycholinguistic aspects of
Research into language acquisition
suggests that language is learned, stored,
retrieved, and produced in multi-word
items, not just as individual words or
Processing of FEIs
Research into the psycholinguistic processing of FEIs
adresses questions such as: how FEIs are recognized;
how they are stored in the mental lexicon; whether
idiomatic meanings are retrieved before, after, or
simultaneously with literal meanings; how variations
and inflections are handled.
In attempting to find out how FEIs are processed, the
notion of the idiom list has been incorporated into the
hypothesis that idioms are stored separately in the
mental lexicon. The analysis of the literal meaning
occurs separately from the idiomatic meaning. The
literal meaning is normally processed first, and when the
processing fails to yield an interpretation for the context,
the idiom list is accessed.
Processing of FEIs
According to another hypothesis, idioms
are stored and retrieved like single
words and idiomatic and literal
meanings are processed
simultaneously. The experiments show
that subjects decode idiomatic meanings
faster than literal ones.
There is a third hypothesis, which
introduces the notion of the key word,
which is a component word in an FEI that
triggers recognition of the whole.
With respect to FEIs, lexicalization is the process by
which a string of words and morphemes becomes
institutionalized as part of the language and develops its
own specialist meaning or function.
Lexicalization of FEIs results from a three-way tension
between quantitative criterion of institutionalization, the
lexicogrammatical criterion of fixedness, and the
qualitative criterion of non-compositionality, but there are
problems with all these criteria: institutionalization and
frequency are not enough on their own, fixedness can be
misleading (there is instability of forms), non-
compositionality is dependent on the ways in which the
meanings of individual words are analysed both in
dictionaries and notional lexicons.
Diachronic considerations
Instituationalization is a diachronic process
much of the lexical, syntactic and semantic
anomalousness of FEIs results from historical
processes. Cranberry collocations such as to
and fro and kith and kin contain lexical items that
were formerly current.
The ill-formed collocation through thick and
thin is an ellipsis of through thicket and thin
wood, and of course is an ellipsis of a matter
of course, or of course and custom, or of
common course.
Diachronic considerations
FEIs disappear, and others emerge.
Metaphors, initially transparent, come in
from sporting, technical, and other
specialist domains, e.g. business
metaphors such as theres no such thing
as a free lunch. As neologisms become
institutionalized and divorced from their
original contexts of use, the explanation or
motivation for the metaphor may become
lost or obscure.

Diachronic considerations
Some metaphorical FEIs and proverbs may be
traced back to classical or Biblical sayings or
historical events, e.g.better late than never, all
roads lead to Rome, an eye for an eye, burn
ones bridges/boats.
Catchphrases drawn from cinema, television,
politics, journalism and so on become
institutionalized as sayings and other kinds of
formula this is an obvious way in which English
fixed expressions realize intertextuality:
And now for something completely different
Didnt she do well
Go ahead, make my day
I think we should be told
Ill be back
Ill have what shes having
Pass the sick bag, Alice
That will do nicely
There is no alternative (abbreviated as TINA)
This could be the beginning of a beautiful friendship
The white heat of this revolution
We wuz robbed
It takes two to tango (song by Hoffman and Manning)
When the going gets tough, the tough get going (popularized by Joseph
The opera isnt over until the fat lady sings (Dan Cook)
Diachronic considerations
The catchphrases above are associated with a memorable event or
film sequence, or consistent media use, they are repeated as
commentary devices, greetings and so on, and become situationally
or culturally bound.
In other cases, FEIs become established as pithy ways of
expressing or referring to concepts; hyphenation is an indicator of
the process of institutionalization and lexicalization. The catenation
of strings into quasi-single words signals the writers intention to
consider a string as a unit, e.g.:
on a first-come-first-served basis
his charity-begins-at-home appeal
a dont-take-no-for-an-answer message
Six months ago it (sc. a hotel) changed owners, but remained in the
The chaos might amuse the man who belonged to the live-fast-die-
young-have-a-good-looking-corpse school.