Beruflich Dokumente
Kultur Dokumente
ini
pointing out its resemblances to face-to-face conversation. The reason for
franca Forch
Pier
such an investigation lies in the fact that movie language is traditionally
considered to be non-representative of spontaneous language. The book
presents a corpus-driven study of the similarities between face-to-face
www.peterlang.com
LINGUE E CULTURE
Languages and Cultures – Langues et Cultures
ini
pointing out its resemblances to face-to-face conversation. The reason for
franca Forch
Pier
such an investigation lies in the fact that movie language is traditionally
considered to be non-representative of spontaneous language. The book
presents a corpus-driven study of the similarities between face-to-face
01
Collana diretta da / Series edited by / Collection dirigée par
Marisa Verna
Giovanni Gobber
Pierfranca Forchini
PETER LANG
Bern · Berlin · Bruxelles · Frankfurt am Main · New York · Oxford · Wien
Bibliographic information published by die Deutsche Nationalbibliothek
Die Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie;
detailed bibliographic data is available on the Internet at ‹http://dnb.d-nb.de›.
Forchini, Pierfranca.
Movie language revisited : evidence from multi-dimensional analysis and corpora /
Pierfranca Forchini.
p. cm. -- (Lingue e culture (languages and cultures - langues et cultures), v. 1)
Includes bibliographical references.
ISBN 978-3-03-431076-5
1. Conversation analysis. 2. Motion pictures. 3. Language and literature. I. Title.
P95.45.F67 2012
791.4301’41--dc23
2011047258
EISBN 9783035103250
ISBN 9783034310765
© Peter Lang AG, International Academic Publishers, Bern 2012
Hochfeldstrasse 32, CH-3012 Bern, Switzerland
info@peterlang.com, www.peterlang.com
Printed in Switzerland
For my mom,
Mariateresa Negrinotti
Acknowledgments
This book would not have come into being without the precious help
and support from many people. First and foremost, I would like to
express my profound gratitude to the members of the Department of
Scienze Linguistiche e Letterature Straniere at Università Cattolica del
Sacro Cuore (Milan, Italy) for providing me with an intellectually
stimulating environment. Special thanks are due to Marisa Verna,
Head of the Department, and to Margherita Ulrych, my dissertation
supervisor. I first became interested in the comparison between movie
and face-to-face conversation during my PhD dissertation research and
I am extremely grateful for her guidance and encouragement in pur-
suing this research. My heartfelt thanks also goes to the two anony-
mous referees, who made detailed and helpful comments, and to
Amanda Murphy, who read the manuscript and gave me incredible
support.
I would also like to thank Northern Arizona University (USA). I am
immensely grateful to Douglas Biber, who wrote the preface for
this book, and to Randi Reppen for offering time, collaboration, and
friendship. Douglas Biber’s ideas, methodology and meticulous com-
ments had a major influence on this book.
My sincere thanks also to John Sinclair, who is no longer with us,
but whose ideas and personality remain illuminating.
Last, but not least, my deepest gratitude and thanks to my extended
family for never-ending love and support.
E ognuno di noi, nell’impresa diuturna dell’analisi, dello
studio, dell’insegnamento, nel lungo viaggio per i
labirinti dello spirito cui ci ha spinto la nostra vocazione,
può forse pensare con modestia e anche con gioia che
questa impresa di avere studiato un oggetto così
prezioso, così trascendente, così perfetto, valeva
le blanc souci de notre toile
(il bianco affanno della nostra vela)
Sergio Cigada
10
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Chapter 1
Opening Credits: Face-to-Face and Movie Conversation . . . . . . . . . 17
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2 Spoken Language Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.2.1 Determinants and Features
of Face-to-Face Conversation . . . . . . . . . . . . . . . . . . . 24
1.3 Movie Conversation Studies . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.3.1 Fictitiousness or Spontaneity in Movie Language? . . . . 34
1.4 Biber’s Multi-Dimensional Analysis . . . . . . . . . . . . . . . . . . . 40
Chapter 2
The Making of: Methodology and Data . . . . . . . . . . . . . . . . . . . . . 47
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.2 Methodological Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.3 The Longman Spoken American Corpus (LSAC) . . . . . . . . . 51
2.4 The American Movie Corpus (AMC) . . . . . . . . . . . . . . . . . . 52
2.4.1 Building Criteria and Norming . . . . . . . . . . . . . . . . . . 53
2.4.2 Transcription Criteria and Tagging . . . . . . . . . . . . . . . 57
Chapter 3
Shot 1: Multi-Dimensional Analysis
of Face-to-Face and Movie Conversation . . . . . . . . . . . . . . . . . . . . . 61
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.2 Informational vs. Involved Production (Dimension 1) . . . . . 66
3.3 Narrative vs. Non-Narrative Concerns (Dimension 2) . . . . . 76
11
3.4 Explicit vs. Situation-Dependent Reference (Dimension 3) . 81
3.5 Overt Expression of Persuasion (Dimension 4) . . . . . . . . . . . 85
3.6 Abstract vs. Non-Abstract Information (Dimension 5) . . . . . 87
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Chapter 4
Shot 2: Close-ups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.2 Multi-Dimensional Analysis of Movie Genre . . . . . . . . . . . . 95
4.2.1 Informational vs. Involved Production
(Dimension 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.2.2 Narrative vs. Non-Narrative Concerns
(Dimension 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.2.3 Explicit vs. Situation-Dependent Reference
(Dimension 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.2.4 Overt Expression of Persuasion
(Dimension 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.2.5 Abstract vs. Non-Abstract Information
(Dimension 5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.3 Phraseological Comparisons . . . . . . . . . . . . . . . . . . . . . . . . 102
4.3.1 Word Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.3.2 Lexical Bundles: Two- and Four-grams . . . . . . . . . . . 104
4.3.3 Multi-Word Sequences and Pattern Types . . . . . . . . . 111
4.3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Chapter 5
Closing Credits: Implications and Applications . . . . . . . . . . . . . . . 117
5.1 Authentic Movie Language . . . . . . . . . . . . . . . . . . . . . . . . . 117
5.1.1 A Reflection of Spoken Language . . . . . . . . . . . . . . . 119
5.1.2 A Source for Spoken Language Teaching . . . . . . . . . . 120
5.2 Concluding Remarks and Future Directions . . . . . . . . . . . . 121
12
Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Appendix 1. Linguistic Features Codes
(Multi-Dimensional Analysis) . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Appendix 2. Face-to-Face Conversation Means Procedure
(Multi-Dimensional Analysis) . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Appendix 3. Movie Conversation Means Procedure
(Multi-Dimensional Analysis) . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Appendix 4. Movie Conversation Feature Counts
(Multi-Dimensional Analysis) . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Appendix 5. Multi-Dimensional Analysis
of Borderline Genre Movies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
13
14
Preface
Over the last several decades, English language movies (and television
shows) have probably had a greater influence on spreading English
world-wide than any other mechanism. Viewers around the world
regularly watch English-language movies, and many ELT professionals
advocate using such movies in the classroom to teach and model natu-
ral conversation. However, other professionals caution that movie lan-
guage is not natural conversation, and thus not an accurate model for
language learners.
Surprisingly, this issue has not been previously investigated empiri-
cally. But in this important book, Forchini does exactly that, using
large-scale corpus analysis to compare the linguistic characteristics of
movie dialogues with spontaneous face-to-face conversations. Forchini
applies a variety of research approaches, including detailed considera-
tion of individual lexical phrases and linguistic features, as well as
Multi-Dimensional Analysis, and also compares the characteristics of
movie comedies and dramas. The results will surprise some readers,
indicating that movie dialogues are in many ways very similar to spon-
taneous face-to-face conversations with respect to a wide range of
linguistic characteristics. In conclusion, Forchini discusses the impli-
cations of these findings for ELT professionals looking for ways to help
their students acquire natural conversational skills.
Douglas Biber
Regents’ Professor
Northern Arizona University
15
16
Chapter 1
Opening Credits: Face-to-Face
and Movie Conversation
1.1 Introduction
17
Figure 1. Spoken language sub-categories (Gregory and Carroll 1978: 47).
18
Would it not be problematic to identify the language as either face-
to-face or movie conversation? Consider, for example, the use of deixis
and ellipsis in the exchange (S1) Oh, my God. There she is. There’s Rose-
mary. (S2) Where? (S1) Right there. (S2) Right where? (S1) Straight ahead.
Across the field; and the use of pragmatic/discourse markers (I mean and
like), of response tokens (yeah), of hesitators/inserts (oh and uh), of
grammaticalized verbs (wanna), of vague language (stuff ), and of mild
expletives (oh my God and holy cow). These would appear to be char-
acteristic of natural conversation, although the passage is actually taken
from the movie Shallow Hal (Bobby and Peter Farrelly 2001).
The following extract is from face-to-face conversation:
Extract 2:
Speaker1 Where are the plates?
Speaker2 Up on the shelf right here behind this door.
Right see right there on the, turn around. No, no, no.
Speaker1 <unclear>
Speaker2 Oh I thought you were getting the plates.
Speaker1 Yeah it’s behind here huh?
Speaker2 Yeah.
Speaker1 Okay.
Speaker2 Just run hot water. Let me get this.
Speaker1 Why are you doing that? It’s evening time now.
Speaker2 I know because I missed out.
Speaker1 You missed out.
Speaker2 I don’t know what I was thinking about.
Here, the use of deixis and ellipsis is significant: (S1) Where are the plates?
(S2) Up on the shelf right here behind this door. Right see right there on
the, turn around. No, no, no., as well as the use of response tokens and
hesitators/inserts such as yeah, huh, and oh. It would appear that the
speakers in Extract 2 use identical strategies to those of the speakers
in Extract 1 and, consequently, that face-to-face and movie conversa-
tion share recognizable spoken features. To what extent, then, are these
features present in the two conversational domains? Do they serve any
functions, and if so, are these functions similar or different? Further-
more, if these functions are similar, from a theoretical point of view,
19
what weight should the artificial status accorded to movie language
carry?
These questions are the grounds for the comparison between face-
to-face and movie conversation which the present investigation offers.
The reason for carrying out such a study lies firstly in the fact that
not many scholars have written about movie dialogues in general, and
very few have compared movie language to face-to-face conversation.
Some related studies have focused on issues of dubbing or sub-titling
movies into other languages, comparing the original and dubbed or
subtitled versions (Bollettieri Bosinelli 1998; Gottlieb and Gambier
2001; Pavesi 2005). Secondly, a considerable amount of work has been
carried out on movie scripts (Taylor 1999, Taylor and Baldry 2004),
but not on transcribed movie dialogues. Thirdly, some strongly-worded
claims about the non-spontaneity of movie language have been based
on intuition, rather than on empirical evidence: Sinclair (2004b: 80),
without providing data, maintains that movie language is “not likely
to be representative of the general usage of conversation” in that its
distinctive features do not “truly reflect natural conversation”. Lastly,
there are no studies of movie language that apply Biber’s (1988) Multi-
Dimensional Analysis approach, which has proven to be reliable as an
empirical method of describing the linguistic characteristics of texts.
Rather than trusting intuitive judgments or examining movie (web)
scripts, then, in this work I present an empirical investigation of the
linguistic similarities or differences between face-to-face and movie
conversation via Multi-Dimensional and corpus-driven analyses. I ad-
dress the following research questions: at a macro-level, to what ex-
tent do face-to-face and movie conversation differ or resemble each
other? At a micro-level, to what extent does movie genre influence this
similarity or difference? And to what extent do lexical bundles, multi-
word sequences and pattern types resemble each other or differ in the
two conversational types investigated? As a means to an answer, I com-
pare the data from an existing spoken American English corpus (the
Longman Spoken American Corpus) to a corpus of American movie con-
versation (the American Movie Corpus), which I purposely built (see
Chapter 2).
20
Conceptually, the work is divided into two main parts: Chapters 1
and 2 introduce the theoretical background, and Chapters 3 and 4
offer the practical analyses. Chapter 5, the concluding chapter, sum-
marizes the findings of the research and highlights its main implica-
tions and applications. In detail, Chapter 1, Opening Credits, illustrates
the reasons why spoken language is a relatively new field of research,
presenting the main contributions to studies of spoken language, and
providing an overview of the determinants and features that charac-
terize spontaneous conversation. It then explores movie conversation,
a type of speech whose status needs clarification on the basis of em-
pirical evidence. Thirdly, it illustrates in detail Biber’s Multi-Dimen-
sional Analysis, which is the statistical and computational methodo-
logy used to compare face-to-face and movie conversation. Chapter
2, The Making of 1, focuses on the main steps of this methodology
and describes the corpora used to retrieve the empirical data.
The data analysis, which was made possible especially thanks to the
collaboration and support of Douglas Biber at Northern Arizona Uni-
versity, is divided into two chapters, Shots 1 and 2. Chapter 3 (Shot 1)
zooms out for a long shot of face-to-face and movie conversation and
offers a quantitative and qualitative macro-investigation by applying
Biber’s Multi-Dimensional Analysis approach. Chapter 4 (Shot 2), in-
stead, zooms in for close-ups of movie genre as a variable that could
influence the analysis, and of lexical bundles, multi-word sequences
and pattern types in the two domains investigated by corpus-driven
analyses.
The crux of the book’s argument, reiterated in Chapter 5, Closing
Credits, is that movie conversation does not, in fact, differ significantly
from face-to-face conversation, and can therefore be legitimately used
to study spoken language. The extensive Multi-Dimensional Analyses
of the linguistic features in the two domains are presented in the Ap-
pendices.
21
1.2 Spoken Language Studies
22
with an imperfect product, especially compared to the accuracy with
which the latest optical text scanners can quickly gobble up vast
amounts of written text and deposit them in machine-readable form”
(McCarthy 1999: 13; see also Biber et al. 1999: 1041, Halliday 2005:
162).
The main contributions to studies of spoken language in the twen-
tieth century have come from scholars working in the field of Prag-
matics. They studied meaning in interaction, that is, those aspects of
language that cannot be considered in isolation from use, thus pro-
viding insights into speech acts (Austin 1962, Searle 1969), implicature
(Grice 1975), and politeness (Brown and Levinson 1987), inter alia.
Others working within the Conversation Analysis paradigm, and deal-
ing more specifically with talk-in-interaction (Thomas 1995), have
discovered and described patterns such as turn-taking (Sacks 1992;
Ford, Fox, and Thompson 2002) and adjacency pairs (Sacks 1992). A
new slant on spoken language was undoubtedly introduced by Cor-
pus Linguistics, which has facilitated the creation of research meth-
ods by attempting to trace a path from data to theory. For example,
the idea that speech has low informational content derives from the
statistical evaluation of word frequencies, which has shown that it has
a high frequency of pronouns and verbs and a low frequency of nouns
(cf. Biber et al. 1999, McCarthy 1999, Biber and Finegan 2001a,
Carter and McCarthy 2006). Another key area of corpus-based research
is register variation: it has been shown, for instance, that noun phrases
are more complex in written than in spoken texts, and that many of
the words of spoken language “clearly belong to the traditional prov-
ince of grammar / function words, in that they are devoid of lexical
content” (McCarthy 1999: 5; see also Halliday 2005).
Thanks to such studies, it has been possible to identify a spectrum
of linguistic features which depend on what Biber et al. (1999: 1041)
label “determinants of conversation”. These determinants, which char-
acterize spontaneous conversation, will be described in the next sec-
tion. The term face-to-face conversation is used here to refer to sponta-
neous conversation which, as Miller and Weinert (1998: 22) point out,
“is typically produced by people talking face-to-face”.
23
1.2.1 Determinants and Features of Face-to-Face Conversation
24
versation is constructed by at least two interlocutors, i. e. a speaker
and a hearer, who dynamically shape their speech within the on-
going exchange. This oscillating movement between the interlocu-
tors is evident in utterance-response sequences, or adjacency pairs,
which “may be either symmetric, as in the case of one greeting echo-
ing another, or asymmetric, such as a sequence of question followed
by answer” (Biber et al. 1999: 1045). Another source of conver-
sational dynamism is the routine use of discourse markers and
similar devices such as interjections, gap fillers, hedges, tags, back-
channels, connectors, and vocatives, which signal the interactive na-
ture of the speaker’s utterance (cf. Erman 1987, Bazzanella 1990,
Gavioli 1999, Aijmer and Stenström 2005, Redeker 2006). The use
of politeness in exchanges such as greetings, requests, offers, and
apologies (Brown and Levinson 1987, Biber et al. 1999, Stame
1999) is another contributing factor to determinant (d).
The presence of supra-segmental and paralinguistic features (determi-
nant a) and shared knowledge in conversation (determinant c) is re-
flected in the conversants’ use of reference and implicit non-elaborated
meaning. These features convey information which may not be obvi-
ous to an outsider, and allow for a reduction of the number of words
uttered, and for simplified grammatical structures. An example of this
is the use of non-clausal or grammatically fragmentary components
such as “stand-alone words” (e. g. interjections) which “rely heavily for
their interpretation on situational factors” (Biber et al. 1999: 1042).
The high frequency of words with a referring function, such as pro-
nouns used in place of nouns, is another example of the way shared
knowledge influences the shape of the language of conversation (Biber
1988).
The on-the-fly trait of face-to-face conversation, determinant (b),
typically gives way to what has been called normal dysfluency (Biber et
al. 1999: 1048) and fragmented language (Chafe 1982: 39). As
McCarthy (1998) and Biber et al. (1999) point out, a speaker’s flow
is naturally impaired by pauses, repetitions (I – I – I ), and hesitators
(er, um), especially when the speaker needs to keep the conversation
25
going and his/her mental planning needs to catch up. Undoubtedly,
even though people do not have time to shape a flow of ideas into
complex and well-integrated utterances, some degree of planning may
be involved and the rate of communication can vary. This planning
clearly comes about before speech happens, in contexts where the
speaker knows what to say, or when the speaker and hearer share
knowledge (Biber et al. 1999).
Finally, the determinants of real time (b) and interactivity (d) lead
speakers to repeat the same repertoire of expressions, falling back on
prefabricated sequences of words. Speakers find it difficult to make
full use of their grammar and lexicon when there is time pressure.
Repetition of prefabricated word sequences, which are readily accessi-
ble from memory (cf. Chafe 1982, Tannen 1982, McCarthy 1998,
Bazzanella 1999, Biber et al. 1999, Cameron 2001, Halliday 2005),
can help them buy time to plan the next chunk of speech. Further-
more, unlike in written language, the fast rate of conversation con-
strains speakers to reduce the length of what they have to say to save
time and energy: in the words of Biber et al. (1999: 1048), “speed of
repartee, making an opportune remark, getting ‘a word in edgeways’
in a lively dialogue, or reaching the point quickly, may all add urgency
to the spoken word”.
Further evidence that speakers prefer repetitive structures is offered
by McCarthy (1999), who suggests the existence of a core vocabulary
of spoken English. The basis of his claim is the fact that “in compu-
ter-based frequency counts, there is usually a point where frequency
drops off rather sharply, from hard-working words which are of ex-
tremely high frequency to words that occur relatively infrequently”
(McCarthy 1999: 2). He identifies nine broad categories of a basic
spoken vocabulary:
– discourse markers (I mean, right, well, so, good, you know, anyway);
– modal items (modal verbs, lexical modals, adverbs, and adjectives);
– delexical verbs (do, make, take, and get);
– interactive words ( just, whatever, thing(s), a bit, slightly, actually, ba-
sically, really, pretty, quite, literally);
26
– basic nouns ( person, problem, life, noise, situation, sort, trouble, family,
kids, room, car, school, door, water, house, TV, ticket);
– general deictics (this, that, here, there, now, then, ago, away, front,
side, …);
– basic adjectives (lovely, nice, different, good, bad, horrible, terrible,
different);
– basic adverbs (especially those referring to time, such as today, yes-
terday, tomorrow, eventually, finally; frequency and habituality, such
as usually, normally, generally; and manner and degree such as quickly,
suddenly, fast, totally, especially);
– basic verbs for actions and events (sit, give, say, leave, stop, help, feel,
put, listen, explain, love, eat, enjoy).
To keep the conversation going, speakers also use redundant informa-
tion. Consequently, spontaneous speech typically contains a large
number of connectors, gap fillers, hedges, tags, backchannels, inter-
jections and discourse markers (Erman 1987, Cameron 2001).
All the features illustrated so far have been further proven in stu-
dies which apply multivariate statistical techniques, such as factor and
cluster analyses2 (Biber 1985, Biber and Finegan 1986). These studies
show that face-to-face conversation is interpersonal, situation-depend-
ent, and has no narrative concern3, or as Biber and Finegan (1986)
put it, is a highly interactive, situated and immediate text type. It is
typically interactive because it displays “frequent occurrence of features
like first and second person pronouns, questions, hedges, contractions,
that-clauses, if-clauses” (Biber and Finegan 1986: 40). At the same time,
27
it is situated, because place and time adverbs are frequently used, and
immediate because present tenses are more common than past tenses.
The above-mentioned traits of spontaneous face-to-face conversa-
tion make it typically informal. Consequently, since informality is of-
ten associated with laxity in terms of form, it is thought that face-to-
face conversation is not influenced by the traditions of prestige and
correctness associated with written texts, “where the English language
is on its best behaviour” (Biber et al. 1999: 1050). Two points can be
made in this regard. First, the linguistic features of spontaneous con-
versation, which are intrinsically pragmatic, should not be treated as
performance errors. These features simply reflect the conditions un-
der which it is produced: “the structures of spontaneous spoken lan-
guage have developed in such a way that they can [italics in text] be
used in the circumstances in which conversation […] usually takes
place” (Miller and Weinert 1998: 23). That is to say, in Schiffrin’s
(1987: 5) words, “language is potentially sensitive to all of the con-
texts in which it occurs […] and reflects [bold in text] these contexts
because it helps to constitute them”. Given the impromptu nature of
face-to-face conversation, devices like gap fillers are typically used to
keep the conversation going, and since this type of conversation usu-
ally takes place between people who are on familiar terms, a range of
informal expressions, such as non-standard verb forms like ain’t in Brit-
ish English, or y’all as a second person pronoun in Southern varieties
of American English may be employed (Biber et al. 1999: 1050).
Secondly, as Halliday (1985b) points out, it is only when spoken
language is perceived in terms of its transcription that it appears to
lack a clearly defined shape or form. If the transcription of a written
text included the planning processes behind it (e. g. brainstorming,
drafting, editing), the final text would appear amorphous as well.
28
1.3 Movie Conversation Studies
29
as the language(s) spoken, sounds, noises, etc., and the visual chan-
nel, which includes all the physical images within the movie, such
as road signs, shop fronts, clothes, colors, body movements, face
expressions, and so on. The possible interplays and combinations
of these two channels are fundamental to the delivery of their “aural-
verbal, aural non-verbal, visual-verbal and visual non-verbal messages”
(Remael 2001: 14), and to the production of their meaning (Cattrysse
2001).
According to Chaume (2004c: 16–21), the following codes can be
distinguished:
1. the linguistic code, which is the language used;
2. the paralinguistic code, which denotes features that provide non-
verbal information, but which are still auditory (e. g. laughter);
3/4. the musical and the special effects code, which are represented
by songs and illusions created by props, camerawork, computer
graphics, etc.;
5. the sound arrangement code, which deals with features which ei-
ther belong to the story (i. e. diegetic sound) or to a person or
object that is not part of the story, such as an off-screen narrator
(i. e. non-diegetic sound). The sound arrangement code implies
both the sounds that are produced on-screen (those associated
with the vision of the sound source) or off-screen (those whose
origin is not present in the frame and therefore not visible simul-
taneously with the perception of the sound);
6. the iconographic code, which represents the icons, indices, and
symbols in the movie;
7. the photographic code, which deals with changes in lighting, in
perspective, or in the use of color (e. g. color vs. black and white
or intentional use of some colors);
8. the planning code, which depends on the types of shots (i. e. close-
ups and extreme close-ups);
9. the mobility code, which includes proxemic (i. e. related to space)
and kinetic (i. e. related to motion) signs, and the screen charac-
ters’ mouth articulation;
30
10. the graphic code, which is the written language present on screen
(i. e. titles, intertitles, texts, and subtitles);
11. the syntactic code, which is the editing, namely, the process of shot
associations.
The linguistic, the paralinguistic, the musical, the special effects, and
the sound arrangement codes are transmitted by the auditory/acous-
tic channel, whereas the iconographic, photographic, the planning, the
mobility, the graphic, and the syntactic/editing codes are transmitted
by the visual channel (Chaume 2004c: 16).
The lack of empirical evidence can also be ascribed to the fact that it
is not easy to find transcriptions of movie dialogues, and the absence of
freely available movie corpora is a hindrance to such research. Further-
more, scripts which can be downloaded from the internet differ consider-
ably from what is actually said in movies. It is not surprising, then, that
studies based on scripts make claims about the non-spontaneity of movie
language. Although such scripts are a genre in their own right, they are
in fact inappropriate for investigations on real movie conversation.
As a start, one common difference is the length of the transcripts.
To take an example from the AMC, the total number of words tran-
scribed for the movie Shallow Hal is 11,490, whereas the script re-
trieved from the web4 contains 10,660 words. Another important dif-
ference is the number of occurrences of features which are typical of
spoken language: there are 49 occurrences of you know and 31 of I
mean in the transcription of the movie Shallow Hal, whereas they oc-
cur respectively 38 and 23 times in the web script.
The following extracts, which have been taken from the movie cor-
pus used in this study (the AMC, cf. Section 2.4) and from the internet,
show the extent of this difference5. Extract 3 demonstrates that the same
4 Cf. <http://www.script-o-rama.com/movie_scripts/s/shallow-hal-script-tran-
script-paltrow.html>.
5 It is worth noting that the scriptwriter him/herself sometimes puts a note un-
derlying that the web script has not been written to represent the actual words
in the movie: “This transcript is not trying to get the movie word for word, but
close to it. This transcript is for reading purposes only!”; source: <http://www.
awesomefilm.com/script/MI2.html>.
31
scene has the same content in the two transcriptions, but different word-
ing: both texts start with an introduction and a joke, but in the AMC
Extract 3. A scene
transcription from The Devil
it is Andrea, and Wears Prada: same
not Emily, whocontent, different
talks first, andwording
the sen-
tence about the joke is more explicit (cf. “Great. Human Resources
certainly has an odd sense of humor. Follow me.” vs. “They do have an odd
sense of3.humor”);
Extract Andrea
A scene from alsoWears
The Devil has aPrada
different
: samelast name
content, in thewording
different two texts
(Sachs vs. Barnes). Moreover, the web script (which is deliberately pre-
sented here in its online script format) also contains a description of the
setting, which is not present in the transcription, given that this infor-
mation is not relevant to the analysis of the language used in the movie.
Extract 3. A scene from The Devil Wears Prada: same content, different wording.
6
AMC transcription Web (tran)script
Hi. Uh, I have an
Andrea appointment with Emily
Charlton?
Andrea Yes.
Great. Human
Resources certainly has
Emily
an odd sense of humor.
Follow me.
Okay, so I was
Miranda’s second
assistant… but her first
Emily
assistant recently got
promoted, and so now
I’m the first.
Oh, and you’re
Andrea
replacing yourself.
Well, I am trying.
Miranda sacked the last
two girls after only a few
Emily weeks. We need to find
someone who can
survive here. Do you
understand?
Yeah. Of course. Who’s
Andrea
Miranda?
6
Extract from the screenplay by Peter Hedges. Source:
Oh, my God. I will
pretend you did not just
ask me that. She’s the
editor in chief of
Runway, not to mention
Emily a legend. You work a
year for her, and you
can get a job at any
magazine you want. A
million girls would kill
for this job.
It sounds like a great
Andrea opportunity. I’d love to
be considered.
Andrea, Runway is a
fashion magazine so an
Emily
interest in fashion is
crucial.
ExtractExtract 4, instead,
4, instead, whichwhich focuses
focuses on conversationonly,
on conversation only, and
and illustrates
illustrates
the Extract
the first 4, instead,
words
first words utteredinwhich
uttered in
thethefocusesandonwritten
movie
movie conversation
and written inonly, and
theshows
in the script, illustrates
script, shows
that the
thethat
firstthe
two textstwo
words texts
uttered
have have
in theadifferent
a totally totallyand
movie different
written
content content
in they
(i.e. (i. e.inthey
the script,
start shows start
that in
a completely thea
twocompletely
different
texts have different
way), aand that way),
totally andcontent
even scripts
different that
whicheven scripts
have they
(i.e. been startwhich
strippedin aofhave been
redundant
completely
stripped
information
different
of redundant
way), (like
and the
information
thatsetting) and which
even scripts
(like
onlyhave
which
the
contain setting)
been words and
do not
stripped
which
of correspond
redundant
only
contain words do not correspond to what is actually said in the movies.
to what is actually said in the movies.
information (like the setting) and which only contain words do not correspond
to Extract
what is4.
actually said
The very in words
first uttered in Catwoman: totally different content.
the movies.
Extract 4. The very first words uttered in Catwoman: totally different content
7
AMC transcription Web (tran)script
Sinclair’s (2004b: 80) point of view is fundamental for the present work
for two reasons: first, because it draws attention to the fact that movie
language has its own distinctive features and thus strengthens the case
for searching for them; second, because it openly declares that movie
language has “a very limited value” since it does not reflect natural
conversation and, consequently, is “not likely to be representative of
the general usage of conversation”. The crucial missing element in the
comment is, however, empirical evidence, which is the objective of
the present work.
34
empirical studies. Movie language is described as artificial, yet, at the
same time, it is also considered a domain displaying elements of spon-
taneity (Taylor 1999, Pavesi 2005).
There is no doubt that in terms of literal spontaneity, movie lan-
guage is fictitious: it pretends to be authentic, but it is planned and
artificial. It is usually defined as non-spontaneous for three reasons:
first, because it is prefabricated; second, because it is written, or rather,
it is written to be spoken as if it were not written; third, because it
always implies some reciting, i. e. the speakers are actors who recite
and have to follow a script, or screenplay, even when they are asked
to improvize (Nencioni 1976, 1983; Gregory and Carroll 1978; Taylor
1999; Rossi 2003; Pavesi 2005).
There are also other factors which, it is claimed, create non-spon-
taneity: Taylor (1999) draws attention to the time and space constraints
of movies. Movie length is a constraint that leads to fictitiousness, in
that it obliges the scenes and language to be explicit and compact: there
is not much time and space for redundancy in a two-hour movie; con-
sequently, dialogic exchanges and scenes must be relevant and precise.
Even scenes that might be of interest are often edited out because of
the lack of space and time, although they are sometimes inserted in
extra sections of the final versions of DVDs.
Taylor (1999) also points out the need for movies to be appealing
and to be commercially successful, which naturally influences (and jus-
tifies) the linguistic choices made when building up the dialogue. Such
appeal is closely connected to two other important constraints, the
need to relate enthralling stories, and to prevent the audience from
losing track of the plot, which compromises the spontaneity of lan-
guage. These constraints make the spontaneity of language secondary
because, in the interest of box office sales, the story line needs to be
involving and clear. Strategies to achieve involvement and clarity can
be seen in the “excess of highly pertinent, dramatic or intriguing ex-
changes” of dialogues which are not “garbled”, but rather “clearly sepa-
rated” (Taylor 1999: 265–266). As an inevitable consequence, movie
conversation loses some spontaneity and acquires artificiality: even
when the audience is introduced to a scene that starts mid-conversa-
35
tion, which is supposed to recall spontaneous speech, the information
exchange is “often made artificially clear” (Taylor 1999: 267; cf. also
Pavesi 2005: 34). The same happens when an on-going conversation
is stopped by the introduction of a new scene and then re-presented:
the dialogue continues from the point it was at before the interrup-
tion, regardless of the time that has passed. Similarly, when an initial
topic of conversation gives way to a series of subtopics, so as to re-
semble spontaneous speech, movie dialogue still tends to “stick to the
point” (Taylor 1999: 267), whereas in spontaneous conversation to-
tally different subjects can easily emerge and then be abandoned.
Another artificial strategy used to help the audience keep track of
the movie is the introduction of a carefully planned rhythm of the
dialogue, which is slower and clearer than in naturally occurring con-
versation (Pavesi 2005). This is closely bound to another difference
from spontaneous conversation, mentioned by Quaglio and Biber
(2006) in connection with the TV series Friends. The language in
Friends “has almost no overlaps, to avoid possibility of misunderstand-
ings by the audience” and, “at the discourse level,” it has “far fewer
repetitions and interruptions” than those usually found in natural con-
versation (Quaglio and Biber 2006: 716–717). In a similar way, it is
claimed that features which are usually abundant in real conversation,
like discourse markers and vocatives, do not occur very frequently in
movies (cf. Chaume 2004b: 850; Pavesi 2005: 32).
Alongside the artificial strategies mentioned so far, some features
are mentioned in the literature which are, in fact, traces of spontane-
ity. The claim made here is that these traces of spontaneity depend
on the typical determinants of natural conversation. Thus, it is pos-
ited that these determinants can analogously be called determinants of
movie conversation.
Consequently:
(a) movie conversation takes place in the spoken medium and oc-
curs with non-verbal paralinguistic features;
(b) movie conversation not only pretends to take place in real time,
but actually does take place in real time, if real time is perceived
36
as an ongoing process. Although movies are pre-recorded and not
impromptu events, the audience perceives that something is hap-
pening while watching the movie: “the visual medium with mov-
ing images and the potential of exploiting the written and spo-
ken codes at the same time enhances the sense of immediacy”
(Mansfield 2006: 34; cf. also Pavesi 2005: 30);
(c) movie conversation usually takes place in a shared context;
(d) movie conversation is interactive, continuous, and expressive of
politeness, emotion, and attitude.
In detail, the presence of deictics and elisions, reported by Taylor
(1999) and Rossi (2003), are due to determinants (a) and (c): the
fact that movie conversation takes place in the spoken medium (de-
terminant a) and occurs with non-verbal paralinguistic features (de-
terminant c) implies that speakers and listeners can rely on implicit
meaning or reference, and thus avoid elaboration or specification of
meaning.
The use of incomplete utterances, self-corrections / repairs, reformu-
lations, repetitions, insert breaks / pauses (Rossi 2003), and/or over-
lapping conversation (Taylor 1999) are all effects of determinant (b),
which is typically realized as normal dysfluency and fragmented lan-
guage of any conversation which takes place in real time.
Finally, the use of inserts, hesitators, vocatives, hedges, adjacency
pairs (question / answer), short and phatic devices, expletives, fillers,
tag questions, and discourse markers (Taylor 1999; Rossi 2003;
Quaglio and Biber 2006, Forchini 2010) can all be seen as realiza-
tions of determinants (b) and (d): the on the fly and interactive char-
acter of conversation leads speakers to use the same repertoire of ex-
pressions to keep the conversation going.
The vocabulary of movie language offers further evidence of speak-
ers using repetitive structures when talking, which can be ascribed to
determinants (b) and (d). Indeed, it seems to favor a core vocabulary,
which usually avoids literary and dialectal terms, jargon and techni-
cisms (Pavesi 1994, 1996; Taylor 1999). This is a very interesting trait
for, although Pavesi (2005) points out the presence of artificial ele-
37
ments in movies8, the use of some of these features, like the simplifi-
cation of syntactic structures (cf. Biber et al. 1999) and the basic core
just mentioned, recalls the basic, or core, vocabulary typical of spon-
taneous spoken conversation (cf. McCarthy 1999: 2 and Section 1.2).
Another feature of movie language is that it tends to be informal,
due to the typical linguistic traits of spontaneous conversation illus-
trated so far. This feature is linked to the interactive determinant (d),
which highlights the interpersonal meta-function at work in movie
language, establishing and maintaining social relations (Halliday
1985a). An example of this informality is the use of syntactic-prag-
matic strategies like contractions (Quaglio and Biber 2006), fronting
(Taylor 1999), dislocations (Taylor 1999, Pavesi 2005), clefts (Pavesi
2005), and of dialogic two-grams9 such as are you, do you, all right,
come on, thank you (Forchini 2010).
Interestingly, these recurring interactive features in movie conver-
sation occur in various environments, similarly to spontaneous con-
versation. Pavesi (2005: 30) notes that dialogic exchanges between col-
leagues, friends, or neighbors may take place in any location, in
restaurants, at the mall, at the hairdressers’, etc. Equally, movie dia-
logue occurs in any kind of interaction, in symmetric (between equals)
or a-symmetric relationships (superior-inferior, doctor-patient, teacher-
learner), and in more than one language: movie dialogue can display
8 Cf. the leveling out of sociolinguistic variation (i. e. dialectal traits, local and
colloquial tones are often deleted and/or simplified); of syntactic structures
(i. e. monoclausal utterances are usually preferred and subordination tends to
be distributed homogeneously, cf. Pavesi 2005: 32 and Rossi 2003: 103); of
lexical choices (i. e. usually movies offer the same core vocabulary, avoiding
literary and dialectal terms, jargon and technicisms, cf. Pavesi 2005: 33); of
turn taking and utterances (i. e. the latter tend to employ the same number of
words, cf. Pavesi 2005: 32); and of dialogues (like, for example, the reduced
and predictable use of phatic devices, interjections and discourse markers, Pavesi
2005: 34), which are often stereotyped.
9 N-grams (Fletcher <http://phrasesinenglish.org/>) are also referred to as lexical
bundles (Biber et al. 1999), sequences of words (Hunston 2006), clusters (Scott
and Tribble 2006), phrasal units (Stubbs 2007), etc.
38
plurilinguism, code-switching (the movement from one language to
another), and code-mixing (hybridization) (Rossi 2003: 113).
This introductory section has illustrated that the literature describes
movie language as both fictitious and spontaneous. Movie speech can
be labeled “quasi-speech”, as Sinclair (2004b: 80) calls it, if the term
“speech” is identified with spontaneous spoken language; movie lan-
guage is indeed a spoken variety, but it is first written and then re-
cited. It cannot therefore be said to be 100% spoken: spontaneity is
just an illusion. Furthermore, some of the distinctive characteristics
of movie dialogue that contribute significantly to its non-spontaneity
are necessarily “imposed by the televized medium” (Quaglio and Biber
2006: 716) and are needed to fulfill a number of functions, such as
contributing to the unfolding and comprehension of the narrative.
Clear and concise dialogues, together with explicit and linear breaks
and blocks of information, help the audience understand what is go-
ing on (Taylor 1999, Pavesi 2005). It is curious to note that audiences
easily accept the non-spontaneous anomalies of movies (Pavesi 2005:
30), proving that movie non-spontaneity is not limiting. Besides, the
fact that movie dialogues imitate real dialogue without necessarily in-
cluding all the typical features of spontaneous spoken discourse (Taylor
1999, Chaume 2004b; Pavesi 2005) makes them a peculiar conversa-
tional domain and a variety of its own.
As for the degree of spontaneity present in movie language, some
scholars base their judgment on intuition, while others report defi-
nite instances of spontaneity. However, this evidence is not founded
on a systematic, quantifiable investigation, but is either qualitative, or
based on scripts, which have been shown to be inadequate represen-
tations of movie conversation. It seems clear that the status of movie
language needs clarification on the basis of empirical evidence.
The Multi-Dimensional Analysis presented here offers a docu-
mented analysis of movie dialogue. Through Biber’s methodology, it
is possible to demonstrate empirically the similarity or difference be-
tween movie and face-to-face conversation.
39
1.4 Biber’s Multi-Dimensional Analysis
40
The Dimensions usually considered in Multi-Dimensional Analysis
are represented by the following five Factors10 (Biber 1988):
– Factor 1, which represents the informational (negative) vs. involved
(positive) production dimension (Dimension 1), marks “high infor-
mational density and exact informational content versus affective,
interactional, and generalized content” (Biber 1988: 107). Factor
1 is determined by two main parameters: the primary purpose of
the writer / speaker, which can be either informational or interac-
tive, affective, and involved; and the production circumstances,
which can be characterized by either careful editing, precision in
lexical choices and an integrated textual structure, or by general-
ized lexical choices and fragmented presentation of information.
– Factor 2, which represents the narrative (positive) vs. non-narrative
concerns (negative) dimension (Dimension 2), “can be considered as
distinguishing narrative discourse from other types of discourse”
(Biber 1988: 109). More specifically, narrative concerns are marked
by the presence of past time, third person animate referents, re-
ported speech, and details, whereas non-narrative concerns are
marked by immediate time and attributive nominal elaboration.
– Factor 3, which represents the explicit (positive) vs. situation-depend-
ent (negative) reference dimension (Dimension 3), distinguishes “be-
tween highly explicit, context-independent reference and non-spe-
cific, situation-dependent reference” (Biber 1988: 110). Wh-relative
clauses, for instance, specify the identity referents explicitly, whereas
time and place adverbials are dependent on referential inferences
(Biber 1988: 110).
10 Biber (1988 and 1995) and Conrad and Biber (2001) also consider two other
Factors (Factor 6 and 7), namely, the dimensions regarding “On-line Informa-
tional Elaboration Marking Stance” and “Academic Hedging”. These two fac-
tors (i. e. Dimensions 6 and 7 respectively) are not taken into account here for
they are considered still tentative by the literature given the difficulty of their
interpretation (cf. Conrad and Biber 2001: 39). It is worth noting, however,
that face-to-face conversation is usually unmarked in the use of the features
associated with these Dimensions.
41
– Factor 4, which represents the overt expression of persuasion (posi-
tive) dimension (Dimension 4), “marks the degree to which persua-
sion is marked overtly” (Biber 1988: 111). Biber holds that predic-
tion, necessity, possibility modals, together with infinitives, conditional
subordination, suasive verbs, and split auxiliaries mark persuasion.
– Factor 5, which represents the abstract (positive) vs. non-abstract
(negative) information dimension (Dimension 5), “seems to mark in-
formational discourse that is abstract, technical, and formal versus
other types of discourse” (Biber 1988: 113). The use of conjuncts,
agentless passive verbs, by-passives, passive postnominal modifiers, in-
ter alia, have positive weights on this factor.
These factors are considered Dimensions in that they define “continuums
of variation rather than discrete poles” (Biber 1988: 9). This means
that Multi-Dimensional Analysis describes texts that are to be inter-
preted as more or less formal, narrative, explicit, etc., rather than either
formal or non-formal, narrative or non-narrative, explicit or situation-
dependent, etc. This approximation is clarified by the following texts
taken from Biber (1988: 10–12). Text 1 is an example of conversation,
while Text 2 is an example of scientific exposition. These contrasting
types might seem to be entirely different, because conversation involves
ordinary language, which is unplanned and interactive, whereas scien-
interactive, but by looking at Text 3, an example of a panel discussion, it is
tific exposition is specialized, planned and non-interactive, but by look-
evident
ingthat these
at Text 3, parameters
an example do of anot define
panel clear-cut
discussion, it isdimensions,
evident thatbut rather
these
parameters
indicate do not. define
a continuum clear-cut Text
Consequently, dimensions,
1 can but be rather indicate
described as aless
continuum. Consequently, Text 1 can be described as less specialized,
specialized, less planned and more interactive than Text 2, and Text 3 as lying
less planned and more interactive than Text 2, and Text 3 as lying be-
between the two.
tween the two.
42
Text 2. Scientific exposition (Biber 1988:
Text 2. Scientific 10).(Biber 1988:10
exposition
This means that if the factor analysis of a corpus reveals, for example,
that the occurrence of first person pronouns in a text is high, it can
then be expected that questions will occur to a similar extent; con-
versely, when first person pronouns are absent from a text, it is likely
that questions are absent too (Biber 1988: 80). Interestingly, research
has also led to the hypothesis of the existence of universal dimensions
of register variation, in that some dimensions seem to occur across lan-
guages and across general and restricted discourse domains (Biber 2004:
17).
44
Last but not least, through factor analysis, Biber’s (1988) Multi-
Dimensional approach provides both quantitative and qualitative
methods which empirically identify and interpret patterns which co-
occur as underlying dimensions of variation. It is useful to note that
this approach also works on small proportions of corpora: even when
corpora are split into small parts and then investigated, factor analy-
sis provides nearly the same dimensions of variation, as long as the
samples of the corpora include an equivalent range of register varia-
tion (Biber 2004: 16).
45
46
Chapter 2
The Making of: Methodology and Data
2.1 Introduction
47
The next sections provide methodological information about Biber’s
(1988) approach, the compilation of the American Movie Corpus, cre-
ated to explore movie language, and the Longman Spoken American
Corpus, used to investigate face-to-face conversation.
Two distinct analyses are presented here: at a macro level, Biber’s (1988)
Multi-Dimensional Analysis is adopted to determine a) the text type
to which movie language belongs, and b) the co-occurrence relations
among linguistic features in face-to-face and movie conversation. At
a micro level, instead, the occurrences of specific linguistic features such
as single words, lexical bundles, and multi-word sequences and pat-
tern types are explored, using corpus-driven criteria (Francis 1993,
Tognini-Bonelli 2001, Biber 2009).
In Multi-Dimensional Analyses, the co-occurrence patterns, which
are dimensions of variation underlying the text, are identified first
quantitatively using factor analyses, and then qualitatively, interpret-
ing their function. In order to apply this approach illustrated in Chap-
ter 1, the following eight methodological steps, suggested by Biber
(1988, 1995, 2004), were followed:
1. The corpus design, collection, transcription, and input into the com-
puter: the Longman Spoken American Corpus and the American
Movie Corpus, collected in electronic format and transcribed ac-
cording to corpus building criteria (cf. Sections 2.3 and 2.4), are
the sources of the data;
2. Identification of the linguistic features and of their functional asso-
ciations to be included in the analysis: the linguistic features and
their functional associations which are identified by Biber (1988)
and are associated to Factors 1–5 and their corresponding Dimen-
sions illustrated in Chapter 1 are considered;
48
3/4. Development of computer software programs which tag all relevant
linguistic features in the corpus and automatic tagging of the corpus
and editing of the texts to check whether the linguistic features are
accurately identified: both the corpora used were kindly tagged and
processed for Multi-Dimensional Analyses by Douglas Biber with
the Biber grammatical tagger he developed;
5. Counting of each linguistic feature in each text of the corpus via ad-
ditional computer programs: both the corpora were processed by
Douglas Biber with the SAS software package for statistical analy-
ses he adapted for linguistic studies; with the aid of the SAS soft-
ware package, the grammatical features mentioned in point (4)
were turned into the underlying Dimensions characterizing the two
conversational domains investigated here. The software programs
MonoConc Pro Version 2.0 (published by Athelstan) and Oxford
Wordsmith Tools 4.0 (developed by Scott 1988) were also used to
explore the occurrences of some linguistic features11;
6/7. Factor analysis of the co-occurrence patterns among linguistic features
and functional interpretation of the factors as underlying dimensions
of variation: Factors 1–5 were identified, first, quantitatively via
factor analyses, then, qualitatively via functional interpretations;
11 Both programs offer similar features, such as the ability to generate wordlists,
concordances and collocations, and they both can handle large tagged or
untagged corpora. The reason for using two software programs which can per-
form similar tasks was to compensate for their individual limits: MonoConc
Pro, for instance, can split the screen display and expand the context of the
node by highlighting the line in a more user-friendly way than Wordsmith Tools
4.0, while Wordsmith Tools 4.0 provides useful plots which give information
about the distribution of an occurrence in a single text or across texts, and
cluster information (cf. Reppen 2001b on the differences between the two
software programs). In particular, WordList, Concord, and Plot were used from
Wordsmith Tools 4.0: the first created lists of the single words or word-clusters
in the texts, set out in alphabetical or frequency order; the second retrieved
words or phrases in context to see the company they keep; and the third pro-
vided information about the distribution of an occurrence in a single text or
across texts (Scott 1998, Scott and Tribble 2006).
49
8. Computing of the dimension scores for each text; comparison of the
mean dimension scores for each register to analyze the salient lin-
guistic similarities and differences among the registers being studied:
the scores that emerged from the Multi-Dimensional Analysis
were computed and compared to analyze the salient linguistic
similarities and differences between face-to-face and movie con-
versation.
In corpus-driven analyses, the specific linguistic features are identified
both quantitatively and qualitatively. The quantitative analyses pro-
vide the frequency of the occurrences of such features, whereas the
qualitative analyses investigate functions in context to highlight func-
tional similarities and/or differences in face-to-face and movie con-
versation. When the occurrences of the items analyzed are too numer-
ous, the analyses are performed on a sample selection of the data and/
or via hypothesis testing, following the suggestions of Sinclair (1999),
identified by Hunston (2002: 52):
Sinclair (1999) advocates selecting 30 random lines, and noting the patterns
in them, then selecting a different 30, noting the new patterns, then another
30 and so on, until further selections of 30 lines no longer yield anything
new. An adaptation of this method is ‘hypothesis testing’, in which a small
selection of lines is used as a basis for a set of hypotheses about patterns. Other
searches are then employed to test those hypotheses and form new ones.
The reasons for following Biber’s (1988) methodology have been stated
in Section 1.4 – i. e. it is reliable, it can predict linguistic features in
texts and can offer both quantitative and qualitative analyses. The rea-
sons for opting for authentic-data oriented analyses, instead, include
the facts that:
– they allow for descriptions of naturally occurring combinations of
words as opposed to the limiting and sometimes deviating traits
of intuition (cf. Sinclair 1991, 2006; Stubbs 2001; Börjas 2006;
Johansson 2007; Svartvik 2007);
– they allow for both quantitative and qualitative research (cf. Biber,
Conrad, and Reppen 1998; Aarts 2001);
50
– they are empirical and lead researchers toward descriptivism as op-
posed to prescriptivism (cf. Halliday 2003c; Biber, Conrad, and
Reppen 1998; Sinclair 2006);
– they can be collected in electronic format in large databases (i. e.
corpora) and can, consequently, be easily and quickly processed,
replicated, and shared (cf. Sinclair 1991, 2004a; Stubbs 2001;
Wynne 2004; Börjas 2006).
12 I was kindly given access to the LSAC by Douglas Biber and Randi Reppen.
13 In particular, the Longman Spoken American Corpus is owned by Pearson Edu-
cation and was gathered by Professor Jack Du Bois and his team at the Univer-
sity of California at Santa Barbara (UCSB).
51
2.4 The American Movie Corpus (AMC)
52
Me, Myself & Irene (Bobby and Peter Farrelly 2000); Meet the Parents
(Jay Roach 2000); Finding Forrester (Gus Van Sant 2000); Shallow Hal
(Bobby and Peter Farrelly 2001); Ocean’s Eleven (Steven Soderbergh
2001); One Hour Photo (Mark Romanek 2002); The Matrix Reloaded
(Andy and Larry Wachowsky 2003); Catwoman (Pitof 2004), and The
Devil Wears Prada (David Frankel 2006).
53
to American face-to-face conversation, the movies selected had to sat-
isfy certain parameters. To be selected for the AMC, movies had to:
(a) be produced in the United States from 2000 on;
(b) be acted / spoken mostly in American English;
(c) not be set in previous centuries and eras;
(d) have ordinary life settings.
Parameters (a) and (b) determine the kind of domain and variety un-
der examination, i. e. American movie language. Parameter (b) also re-
flects the idea of dialogue in action, that is to say, dialogue had to be
present in the movie selected (e. g. narrated movies, documentaries and,
of course, silent movies had to be excluded). Parameter (a), together
with (c), guarantees the contemporaneity of the movies and param-
eter (d) ensures that the language spoken in the movies selected is the
ordinary language of ordinary people: specialized language, for exam-
ple, is not the focus of the present study; consequently, movies con-
taining a high proportion of political debates, academic speeches, le-
gal language and other specific domains were not included. It is worth
noting that, even though some of the characters of the movies selected
have extraordinary powers, they are humans who lead ordinary lives
– Neo in The Matrix Reloaded, for instance, is a man who works in
ordinary information technology and is not a robot or a machine.
Finally, the AMC movies were categorized as belonging to genres
of comedy, non-comedy (i. e. drama, sci-fi, thriller, action, adventure,
crime, fantasy movies) or borderline (when the categorization could
not be clear-cut). On the one hand, including different genres satis-
fied the parameters of balance and representativeness, which require
the full range of linguistic variation existing in the language (Biber
1993, Kennedy 1998). On the other, this provided an opportunity
to see whether genre variation influences the frequency of the spoken
devices analyzed. Movie genre is, in fact, a complex issue: it can be
seen in Table 1 that different reference works categorize movies in dif-
ferent ways. It is also interesting to note that movies usually do not
belong to only one genre. Accordingly, the AMC components are de-
fined along a comedy/non-comedy continuum, which takes into ac-
54
count the categorization provided by Morandini et al. (2006) and by
the Internet Movie Database15 (henceforth IMDB). Table 1 illustrates
the AMC components grouped according to their genre: four movies
are considered to be 100 % comedies, which is in tune with both
Morandini et al. (2006) and the IMDB classifications. Similarly, four
others are considered to be 100 % non-comedies. Three movies are
labeled as not genre specific, since the two reference works selected
characterize them in different ways. Due to this ill-defined status, these
movies are referred to as borderline movies in the analysis.
The idea behind building the movie corpus is to set the foundation
. CHAPTER 2, Page 56:
stones of a large, standard reference corpus for future studies of the
: the raw frequency count is divided by the total number of words in the text, and
movie language it represents (cf. Section 2.4.2). However, since a fi-
then multiplied by whatever basis is chosen for norming. Here, the basis I chose for
nite number of words are often determined at the beginning of a cor-
norming is 1,000, so the raw frequency count was divided by the total number of words in
pus-building project (Sinclair 1991, McEnery and Wilson 1996), a
the text, and was then multiplied by 1,000. WRONG!
finite Change
number of movies were selected to start with. As stated above
into: the raw frequency count is multiplied by whatever basis is chosen
(cf. Section 2.4), the transcriptions of the eleven American movies,
for norming, and then divided by the total number of words. The basis I chose for
together
norming is with
1,000, their
so the dubbed Italian
raw frequency countversions, make
was multiplied up a 204,636-word
by 1,000, and was then
corpus (nearly 44 hours of movie conversation).
divided by the total number of words in the text. Undoubtedly, by some
standards, the AMC is a small corpus. Sinclair (2004a) holds that big
corpora
. CHAPTER are usually
2, Page 58: favored for linguistic research due to the fact that
they
.Tableare
3: I'vmore likely
change into I've to include regularities of language. However, as
. CHAPTER
15 3, Page 64:
<http://www.imdb.com/>.
there is an extra 21 in note 21 after a measure of the spread of the distribution
55
. CHAPTER 3, Pages 67, 68:
In particular, as Table 6 demonstrates (cf. bold), both face-to-face and movie conversation present
the highest frequency in verbs (117.23 vs. 118.21 in the LSAC); a relatively high frequency of first person
pronouns and possessives (72.33 vs. 6 .80 in the LSAC),second person pronouns and possessives ( 3.36
Kennedy (1998: 68) points out, “a huge corpus does not necessarily
‘represent’ a language or a variety of a language any better than a
smaller corpus”; indeed, everything depends on the patterns present
in the corpus and the consequent generalizations that can be made
about them.
Another issue relevant to the size of the AMC is comparability: ide-
ally, two (or more) corpora which are compared should be of the same
size; however, if they are not, data comparison is still feasible by means
of norming the numbers. As Biber, Conrad, and Reppen (1998: 263)
report, it is always possible to adjust data via normalization when cor-
pora are not of the same length, and raw frequency counts are not
directly comparable. Texts of different length can thus be compared
accurately: the raw frequency count is multiplied by whatever basis is
chosen for norming, and then divided by the total number of words.
The basis I chose for norming is 1,000, so the raw frequency count
was multiplied by 1,000, and was then divided by the total number
of words in the text.
The last factor taken into account in compiling the corpus is linked
to its format: the manually transcribed movie dialogues were stored
in Text, Word, and Excel files (.txt, .doc, and .xls respectively). A ma-
chine-readable format favors both quantitative and qualitative data
studies; it allows for computerized storing, quick searching and ma-
nipulating of data, which can be easily enriched with extra informa-
tion. Furthermore, the machine-readable format allows for objectiv-
ity and replicability of the studies, and ensures that information can
be exchanged within the scientific community, and the data re-used.
The .txt, .doc, and .xls formats were chosen to facilitate data process-
ing and information retrieval. In particular, the .txt files were neces-
sary for the Wordsmith Tools 4.0 software (used for concordances and
frequency counting), whereas the .doc and the .xls files were compiled
to include extra information such as the name of speakers, setting and
relevant extra-linguistic features, which are not included in the
Wordsmith Tools 4.0 analyses.
56
2.4.2 Transcription Criteria and Tagging
Speaker At the beginning of each transcript, the speaker is given a unique identifier if
Identification the name is not present. In this case the speaker's gender is also indicated.
Capitalization is used as an aid for human comprehension of the text. The
Capitalization accepted standard way to capitalize words, including words at the beginning
of a sentence, proper names, and so on are followed.
When abbreviations are used as part of a personal title, they can remain as
abbreviations:
Mr. Brown
Mrs. Jones
Dr. Spock
Abbreviations However, when they are used in any other context, they are written out in
full, e. g.:
I went to the junior league game.
I'm going home to see the missus
I went to the doctor, and all he said was, don’t worry, it’s natural.
Hey mister, do you know how to get to the stadium?
Contractions and Table 3 below illustrates what is considered standard written English with
Apostrophe -s respect to contractions.
16 <http://www.ldc.upenn.edu/About>.
17 Cf. also <http://projects.ldc.upenn.edu/SBCSAE/transcription/csae-conventions.
html#ortho>.
57
Table 3. Transcription conventions used for the AMC compilation
Table 3. Transcription conventions used for the AMC compilation for contractions
for contractions and apostrophe s18
and apostrophe s18.
The The
orthographic
orthographic transcriptions
transcriptionswere
werethen
thenchecked
checkedforforaccuracy
accuracybyby
nativespeakers
native speakers ofof English
English andand Italian
Italian who
who were
were notnotinvolved
involved ininthe the
transcription. There were three reasons for choosing orthographic tran-
transcription. There were three reasons for choosing orthographic transcription.
scription. First of all, it provides a representation of spoken language
First of all,
which is itsimple
provides a representation
to read of spoken
and understand: language
compared towhich is simple to
IPA transcrip-
read andfor
tion, understand:
instance, compared
orthographicto IPA transcription,isforeasier
transcription instance, orthographic
because it re-
quires less iseffort
transcription easierand knowledge
because (cf. less
it requires Halliday 1985b).
effort and Secondly,
knowledge it al-
(cf. Halliday
lows for immediate computing processes such
1985b). Secondly, it allows for immediate computing processes such as frequency andas
concordancing information retrieval: orthographic transcription is the
frequency and concordancing information retrieval: orthographic transcription is
format which concordancers usually read. Thirdly, even though ortho-
the formattranscription
graphic which concordancers usually written
is “an imperfect read. approximation
Thirdly, even though of a
orthographic transcription is “an imperfect written approximation of a speech
18 The contractions in bold are not allowed in the LDC transcription guides.
event" (Kennedy 1998:82), and cannot capture all the features of spoken
However, they were kept in the present research for two main reasons: firstly,
conversation
because(Halliday 1985b,
they reflect what isWichmann 2007),
actually said in theitmovies;
forms secondly,
the basisbecause
for all they
other
are also present in the Longman Spoken American Corpus, which is the corpus
18
The contractions
used for thein present
bold are comparative
not allowed instudy.
the LDC transcription guides. However, they were
kept in the present research for two main reasons: firstly, because they reflect what is actually
said in the movies; secondly, because they are also present in the Longman Spoken American
58 , which is the corpus used for the present comparative study.
Corpus
speech event” (Kennedy 1998: 82), and cannot capture all the features
of spoken conversation
transcriptions (Halliday
and annotations. This1985b,
means Wichmann 2007),
that the corpora it forms
could easily be
the basis for all other transcriptions and annotations. This
tagged, and thus enriched with extra information, so as to be processedmeans that
by the
the corpora could easily be tagged, and thus enriched with
SAS software package used in Multi-Dimensional analysis. Both the LSAC and
extra in-
formation, so as to be processed by the SAS software package used in
the AMC were tagged with the Biber grammatical tagger: Extract 5 illustrates a
Multi-Dimensional Analysis. Both the LSAC and the AMC were
tagged sentence (I know we haven’t been
tagged with the Biber grammatical tagger:together
Extractthat long, but these
5 illustrates a taggedlast ten
sentence
months have(I know we haven’t
just been been oftogether
the happiest my life)that
fromlong, but these last ten
the AMC.
months have just been the happiest of my life) from the AMC.
Extract 5. An example of tagged AMC
Extract 5. An example of tagged AMC.
59
19
For further details see Biber, Conrad and Reppen (1998:259)
60
Chapter 3
Shot 1: Multi-Dimensional Analysis
of Face-to-Face and Movie Conversation
3.1 Introduction
61
20 20. Linguistic features of movie and face-to-face conversation.
TableTable
4 4. Linguistic features of movie and face-to-face conversation
20 The variables in all the tables are the linguistic features analyzed (for the mean-
ing of the codes of the specific features see Appendix 1). N stands for the
number of texts selected (with regard to movie conversation the number 3
refers to the three sub-genres, or sub-corpora, labeled comedies, borderline
movies, and non-comedies; with regard to face-to-face conversation 327 refers
to the number of texts included in the LSAC). Mean is the average frequency
of items. The frequency counts of all linguistic features are normalized to a text
length of 1,000 words.
62
This preliminary observation is extremely relevant as it presupposes
that the features shared by face-to-face and movie conversation also
serve similar functions (Biber 1988) and, consequently, have similar
he variablestextual
in all Dimensions
the tables (Biber,
are theConrad
linguistic
and features analyzed
Reppen 1998). Table(for the meaning of
5 clearly
es of the specific features see Appendix 1). N stands for the number of texts selected (w
illustrates this: face-to-face and movie conversation are quantitatively
ard to movie conversation the number 3 refers to the three sub-genres, or sub-corpo
and qualitatively similar since they have four Dimensions out of five
led comedies, borderline movies, and non-comedies; with regard to face-to-f
in common and display a positive score with respect to Dimension 1
versation 327 refers to the number of texts included in the LSAC). Mean is the aver
uency of items.4,The
and and frequency
a negative score
countswith
of respect to Dimension
all linguistic 2 and
features are 3. The
normalized to a t
th of 1,000 only
words.Dimension they differ on is Dimension 5.
63
have four Dimensions out of five in common and display a positive score with
respect to Dimension 1 and 4, and a negative score with respect to Dimension
2 and 3. The only Dimension they differ on is Dimension 5.
21
This means that the traits they share also serve similar functions: both
The label Variable stands for the 5 Dimensions (or Factors, i.e. dim1-5 in the table) taken into
face-to-face and movie conversation are characterized by involved pro-
account; N for the number of texts (or sub-corpora) in the two corpora considered; Mean for
the mean (average) frequency of items (the higher it is, the more frequent the items are); Std
Dev for standard deviation, namely, a measure of the spread of the distribution21; and Minimum
and21Maximum for theVariable
The label minimum and maximum
stands for the 5 frequencies
Dimensionsof(or items respectively.
Factors, Biber, in
i. e. dim1-5 Conrad
the
and Reppen (1998:280) explain that in all Multi-Dimensional studies “frequencies are
table) taken into account; N for the number of texts (or
standardized to a mean of 0.0 and a standard deviation of 1.0 before factor scores are sub-corpora) in the
computed.two corpora
This processconsidered; Mean
translates the for for
scores the all
mean (average)
features frequency
to scales of items
representing (the
standard
deviation higher
units, thus,
it is,regardless
the moreoffrequent
whether the
a feature
items isare); Std Dev
extremely rare fororstandard
extremelydeviation,
common in
absolute terms,
namely, a standard
a measure score of +1
of the represents
spread of theone standard deviation
distribution; and Minimum unit aboveandthe mean
Maxi-
score for the feature in question. That is, standardized scores measure whether a feature is
mum for the minimum and maximum frequencies of items respectively.
common or rare in a text relative to the overall average occurrence of that feature. The raw Biber,
Conrad
frequencies and Reppen
are transformed to (1998:
standard280) explain
scores that all
so that in all Multi-Dimensional
features on a factor willstud- have
equivalenties “frequencies
weights are standardized
in the computation to a mean
of Dimension of 0.0
scores. If and a standard
this process wasdeviation of
not followed,
extremely1.0common
before features wouldare
factor scores have much greater
computed. influence
This process than rare
translates thefeatures on all
scores for the
Dimension scores.”
features to scales representing standard deviation units, thus, regardless of
whether a feature is extremely rare or extremely common in absolute terms, a
standard score of +1 represents one standard deviation unit above the mean
score for the feature in question. That is, standardized scores measure whether
a feature is common or rare in a text relative to the overall average occurrence
of that feature. The raw frequencies are transformed to standard scores so that
all features on a factor will have equivalent weights in the computation of
Dimension scores. If this process was not followed, extremely common fea-
tures would have much greater influence than rare features on the Dimension
scores.”
64
This means that the traits they share also serve similar functions: both
face-to-face and movie conversation are characterized by involved production,
duction, non-narrative
non-narrative concerns, concerns, situation-dependent
situation-dependent reference,
reference, and and aoflow
a low level
level of persuasion
22 22 .
persuasion .
TheseThese findings, which
findings, areillustrated
which are illustrated in Chart
in Chart 1 examined
1 and are and are examined
in detail
in indetail
the next sections, offer a new perspective on movie language by bringing tolan-
in the next sections, offer a new perspective on movie
guage
light by bringing
its similarity toface-to-face
with light its similarity with face-to-face conversation.
conversation.
Chart 1.Chart
Multi-Dimensional Analysis of face-to-face and movie conversation.
1. Multi-Dimensional analysis of face-to-face and movie conversation
22
As illustrated in Chapter 1, Factor 1 (dim1 in Table 5) displays informational versus
22 involved
As illustrated
production, in Chapter
namely, 1, Factor
a Dimension 1 (dim1
which in Table
marks “high 5) displays
informational densityinformational
and exact
informational content versus affective, interactional, and generalized content” (Biber 1988:107).
versus involved production, namely, a Dimension which marks “high
Factor 2 (dim2) represents narrative versus non-narrative concerns, a Dimension which “can be
informa-
tional density and exact informational content versus affective,
considered as distinguishing narrative discourse from other types of discourse” (Biber interactional,
1988:109). Factor 3 (dim3)
and generalized concerns
content” explicit1988:
(Biber versus 107).
situation-dependent reference
Factor 2 (dim2) , a Dimension
represents narra-
which distinguishes “between highly explicit, context-independent reference and nonspecific,
tive versus non-narrative concerns,
situation-dependent reference” (Biber 1988:110). Factor 4 (dim4), the only Dimension which is as
a Dimension which “can be considered
distinguishing
exclusively narrative
positive, reveals discourse
the overt from
expression other types
of persuasion of discourse”
, a Dimension which (Biber 1988:
“marks the
degree to which persuasion is marked overtly” (Biber 1988:111). Finally, Factor 5 (dim5)
109). Factor 3 (dim3) concerns explicit versus situation-dependent
reflects abstract versus non-abstract information, a Dimension which “seems to mark
reference, a
Dimension which distinguishes “between highly explicit, context-independent
informational discourse that is abstract, technical, and formal versus other types of discourse”
(Biber 1988:113).
reference and nonspecific, situation-dependent reference” (Biber 1988: 110).
Factor 4 (dim4), the only Dimension which is exclusively positive, reveals the
overt expression of persuasion, a Dimension which “marks the degree to which
persuasion is marked overtly” (Biber 1988: 111). Finally, Factor 5 (dim5) re-
flects abstract versus non-abstract information, a Dimension which “seems to
mark informational discourse that is abstract, technical, and formal versus other
types of discourse” (Biber 1988: 113).
65
3.2 Informational vs. Involved Production (Dimension 1)
66
Chart 2. Dimension 1: Informational vs. Involved production.
Chart 2. Dimension 1: Informational vs. Involved production
67
(24.60 vs. 19.00); and a moderately high frequency of discourse par-
ticles (14.00 vs. 12.60), demonstrative pronouns (13.14 vs. 11.56), and
emphatic adverbs and qualifiers (e. g. just, really, so: 11.83 vs. 9.10).
Table 6. Linguistic features with a positive weight on Dimension 1 in the LSAC and in
Table 6. Linguistic features with a positive
the AMC
weight on Dimension 1 in the LSAC
and in the AMC.
68
The following mean scores and extracts illustrate the presence in both
face-to-face and movie conversation of the linguistic features which
have a positive weight on Dimension 1 and, consequently, character-
ize a discourse which is expressive of thoughts, private attitudes and
emotions and which is highly interactive. More specifically, Extracts
6a and 6b offer some examples of verb forms, tenses and types which
have very similar mean scores in both corpora: verbs (uninflected
presents, imperatives & third persons) 118.21 (LSAC) vs. 117.23 (AMC);
private verbs 29.49 (LSAC) vs. 24.40 (AMC).
69
Speaker3: Consider what we have seen, Councillor. Consider that in
the past 6 months we have freed more minds than in 6 years.
This attack is an act of desperation. I believe very soon the
prophecy will be fulfilled and this war will end.
[…]
Speaker1: Mrs. Larson? It uh it won’t be much longer, Mrs. Larson.
Speaker2: Oh well is he in a lot of pain?
Speaker1: No No no. There will be no more pain for your husband.
He’s heavily sedated.
Speaker2: OK I think I’m gonna go, send little Hal in now.
Speaker1: No no no I don’t think that’s such a good idea. With all the
painkillers uh the reverend’s not exactly himself.
Speaker2: Look I think my boy has a right to say goodbye to his fa-
ther I mean the man means everything in the world to him.
70
Speaker1: I just was in the neighborhood got off work early.
Thought maybe you wanted to get a bite to eat.
Speaker2: Oh, that’s very sweet. What a nice surprise.
Speaker1: Oh shoot I forgot to change my shoes.
Speaker2: That’s OK you don’t have to change. You know I can’t resist
a man in nurse’s shoes.
Speaker1: I know but I got sneakers in my backpack I’m just gonna
change. It’ll just take a second.
71
of possibility (can, may, might, could), and nominal pronouns (e. g. some-
one, everything). Second, face-to-face and movie conversation also share
relatively low frequency of stranded prepositions, verbs do and be, wh-
questions, wh-clauses, adverbials and hedges (e. g. almost, maybe), con-
tractions, qualifiers and amplifiers (e. g. absolutely, entirely), adverbs (e. g.
qualifiers and amplifier such as absolutely and entirely), and subordinating
conjunctions (e. g. causative because).
73
Table 8. Means procedure of Dimension 1 in the LSAC.
Dimension 1
74
The following extracts show the frequency of these features in the two
The following extracts show the frequency of these features in the two
The following extracts show the frequency of these features in the two
corpora (see bold), and their frequent co-occurrence both in face-to-
face and movie conversation.
75
3.3 Narrative vs. Non-Narrative Concerns (Dimension 2)
Face-to-face conversation
Face-to-face has ahaslow
conversation occurrence
a low occurrence ofof verbs inthe
verbs in perfect
theperfect
aspect
aspect (perfects in the tables), public verbs (e.g. assert, complain, say, report,re-
(perfects in the tables), public verbs (e. g. assert, complain, say,
port,declare
declare – pub_vb),
– pub_vb ), past past
tensetense
verbsverbs (pasttnse),
(pasttnse third person
and person
), and third pro-
pronouns
nouns except it (pro3) (cf. Table 10), which are all devices of narrative
except it (pro3) (cf. Table 10), which are all devices of narrative discourse and
discourse and have a positive weight on Dimension 2 (cf. Biber 1988:
have a positive weight on Dimension 2 (cf. Biber 1988:109). Indeed, past tense
109). Indeed, past tense and perfect aspect mark past events; public verbs
and perfect aspect mark past events; public verbs are used to indicate indirect,
are used to indicate indirect, reported speech; third person pronouns
reported speech; third person pronouns (except it) are used to refer to specific
(except it) are used to refer to specific animate referents described in
animate referents described in narrative discourse (cf. Biber 1988:109).
narrative discourse (cf. Biber 1988: 109).
76
Table 10. Linguistic features of Dimension 2 in the LSAC.
Dimension 2
In much the same way, movie dialogues (cf. Table 12) display very
few occurrences of past tense verbs, third person pronouns, verbs in the
perfect aspect and public verbs (e. g. assert, complain, say).
78
aspect and public verbs (e.g. assert, complain, say).
Dimension 2
The following example from the AMC illustrates the low frequency
of these linguistic items (in bold) and highlights (in italics) those which
The following example from the AMC illustrates the low frequency of
have a negative score on Dimension 2:
these linguistic items (in bold) and highlights (in italics) those which have a
Extract 10b from the AMC:
negative score on Dimension 2:
Speaker1: Mrs. Larson? It uh it won’t be much longer, Mrs. Larson.
Speaker2: Oh well is he in a lot of pain?
Speaker1: No no no. There will be no more pain for your husband He’s
heavily sedated.
Extract 10b from the OK
Speaker2: AMC:I think I’m gonna go, send little Hal in now.
Speaker1: No no no I don’t think that’s such a good idea. With all the
Speaker1: Mrs. Larson?
painkillers It reverend’s
uh the uh it won’t be much
not exactly himself. longer,
Speaker2: Look I think my boy has a right to say goodbye to his father
Mrs. Larson. I mean the man means everything in the world to him.
Speaker2: Oh well is he in a lot of pain?
Although both face-to-face and movie conversation are not mainly
characterized by a high occurrence of past tense verbs, these tenses can
appear in conversation, especially when the talk takes a narrative or
reporting slant, as extracts 11a and 11b demonstrate (see bold). It is
worth noting that, even under these circumstances, the features which
contribute to an interactional and interpersonal dialogic character
(highlighted in Dimension 1, cf. Section 3.2) are still recognizable.
And so he just young, this popular young woman coming in the neigh-
borhood that was a young teacher and he was a popular guy. You know
everybody was crazy about him. After the first wife he said oh she did
79
him so bad he wouldn’t marry again. But then when he saw this young
girl, everybody was inviting her into their homes and to different par-
ties and then she was all modern coming from Houston, you know so
he snapped her up and married her. He asked my dad for her hand
very politely. And daddy gave him a long lecture.
Yeah, Nate said it was great. He actually… He applied here, but they
wanted someone with more experience.
These extracts show that the various Dimensions and Factors of Multi-
Dimensional Analysis are closely related to one another. The linguistic
items which are not frequent and have a positive weight on Dimension
2, for example, are compensated by the high frequency of other items
which play an important part in Dimension 1: as Table 12 and 13
illustrate, the low occurrence of past tense verbs (pasttnse) and of verbs
in the perfect aspect (perfects) is compensated by the high occurrences of
the present forms (pres). In much the same way, the low mean score of
public verbs (pub_vb) is compensated by a high mean score of private
verbs (prv_vb), as the low frequency of third person pronouns except it
(pro3) is compensated by the high frequency of first and second person
pronouns and possessives (pro1 and 2) and it pronouns (it).
Table 12. Comparisons between some of the linguistic features present in the LSAC
which compensate each other in Dimensions 1 and 2.
80
Table 13. Comparisons between some of the linguistic features present in the AMC
which compensate each other in Dimensions 1 and 2.
81
domains under investigation both have a negative mean score (-7.04 and -5.72
respectively, see Chart 4), which implies that they both rely on situation-
dependent reference (cf. Biber 1988).
26 This means that they are usually employed for references outside the text (Biber
1988: 110).
82
which indicate a type of referential and informational discourse.
Dimension 3
Dimension 3
As Table 1 demonstrates, movie conversation also displays extremely
similar mean scores regarding the linguistic features associated with Dimension
3 and found in face-to-face conversation.
83
This means that, analogously to face-to-face conversation, movie
conversation has a negative score on Dimension 3 and, thus, by adopting
This means that, analogously to face-to-face conversation, movie con-
versation has a negative score on Dimension 3 and, thus, by adopting
adverbs and pronouns, it relies on situation-dependent reference. These
features can be seen in the following extract, in which the adverbs and
pronouns are marked in bold:
Speaker1: I have a busy day today. Drinks then dinner. Don’t wait up
will you, darling?
Speaker2: I stopped waiting a long time ago, George
Speaker1: Oh and erm, that lunch tomorrow, cancel that too, will you?
Speaker2: Problems?
Speaker1: I doubt it. But Slever Key won’t stop calling. You know sci-
entists. They’re worse than models. You have to coddle them
all the time, like little children.
84
Table 17. Comparisons between compensating linguistic features in Dimensions 1
and 3 (in the AMC).
85
In linguistic terms, these low mean scores indicate that both the
conversational domains under investigation contain a low percentage of
elements that are typical of persuasion. As Tables 18 and 19 respectively
In linguistic terms, these low mean scores indicate that both the con-
versational domains under investigation contain a low percentage of
elements that are typical of persuasion. As Tables 18 and 19 respec-
tively display, and as all the extracts from the corpora above confirm,
both face-to-face and movie conversation have a low percentage of
infinitive verbs (inf in the table), modals of prediction (will, would, shall
– prd_mod), suasive verbs (e. g. ask, command, insist – sua_vb), subordi-
nating conjunctions – conditionals (e. g. if, unless – sub_cnd), modals of
necessity (e. g. ought, should, must – nec_mod), and adverbs within aux-
iliary (i. e. splitting aux-verb – spl_aux) which usually carry weight in
persuasive language (Biber 1988). Infinitive verbs, for example, can be
used as adjectives and verb complements in expressions like happy to
do it; they encode “the speaker’s attitude or stance towards the propo-
sition encoded in the infinitival clause” (Biber 1988: 111). Modals are
direct pronouncements that certain events will (prediction), should (ob-
ligation or necessity), can or might (possibility) occur. Suasive verbs,
for instance, imply intentions to make an event occur and conditional
subordination specifies the conditions required to do so. Split auxilia-
ries modals
are often (i. e. auxiliaries which occur
, which explains why with an features
these adverb which
have isweight
placed on
be- this
tween them and the main verb, like can often do) are often modals,
dimension (cf. Biber 1988:111).
which explains why these features have weight on this dimension (cf.
Biber 1988: 111).
Table 18. Linguistic features of Dimension 4 in the LSAC
Table 18. Linguistic features of Dimension 4 in the LSAC.
Dimension 4
Dimension 4
27 I. e. the maximum distance between positive and negative mean scores, here:
2.04+1.66=3.70.
87
rather similar, because the span difference27 between them is relatively slight
(3.70). Chart 6 clearly illustrates their closeness.
88
Table 21. Linguistic features of Dimension 5 in the AMC
Table 21. Linguistic features of Dimension 5 in the AMC
Dimension 5
Speaker1: Just relax. And I want you to imagine that you’re on a beach.
Speaker2: OK.
Speaker1: It’s a warm day and the sun is just starting to set. And you’re
looking in the eyes of a woman, and you’re feeling her heart.
You’re seeing her soul. You’re feeling her spirit. That’s it. That’s
it. Excellent. Excellent.
89
3.7 Summary
90
With respect to Dimension 4, namely Overt Expression of Persua-
sion, the data have proven that, even though the two conversational
domains have positive scores, they do not have a high percentage of
infinitive verbs, modals of prediction, suasive verbs, subordinating con-
junctions, modals of necessity, and adverbs within the auxiliary, which
have weight on this factor. Their mean scores are rather low and again
extremely similar (face-to-face conversation: 0.60 vs. movie conversa-
tion: 0.64).
The only difference that has emerged from Multi-Dimensional
Analysis regards Dimension 5, namely Abstract versus Non-abstract In-
formation: movie conversation has a positive score (1.66) and has con-
sequently been defined as abstract, whereas face-to-face conversation
has a negative score (-2.04) and has been labeled as non-abstract. De-
spite this difference in polarity, however, it has been pointed out that
neither of the two conversational domains has a high score on fea-
tures which contribute to abstract information, which means that the
difference between them, which corresponds to 3.70, is fairly mini-
mal: a low percentage of agentless passive verbs, passive verbs + by, and
passive postnominal modifiers characterize both face-to-face and movie
conversation. As a consequence, Dimension 5 should not be consid-
ered as particularly important in differentiating between the two con-
versational domains. The main difference has been ascribed to the rela-
tively higher presence of adverbial conjuncts in the movies compared
to face-to-face conversation, namely 6.83 and 1.37 respectively, which
have positive weights on this factor.
Table 22 and Chart 7 sum up the similarities which have emerged
from Multi-Dimensional Analysis: Table 22 recapitulates the main fea-
tures characterizing the five dimensions investigated, whereas Chart 7
outlines the closeness of the mean scores of face-to-face and movie con-
versation.
91
Table 22. Summary of the features that emerged from Multi-Dimensional Analysis.
Table 22. Summary of the features that emerged from Multi-Dimensional analysis
Table 22. Summary of the features that emerged from Multi-Dimensional analysis
Chart 7. Summary of the main charts that emerged from Multi-Dimensional analysis
92
Chart 7. Summary of the main charts that emerged from Multi-Dimensional analysis
Chart 7. Summary of the main charts that emerged from Multi-Dimensional Analysis.
93
The two conversational corpora examined are extremely similar also in
rms of the Dimension which characterizes them most: they display the
ghest and most significant mean score in Dimension 1 (face-to-face
The two conversational corpora examined are extremely similar also
in terms of the Dimension which characterizes them most: they dis-
play the highest and most significant mean score in Dimension 1 (face-
to-face conversation: 35.04 vs. movie conversation: 35.31). Quanti-
tatively, the same can be said for the mean scores characterizing the
other Dimensions (Dimension 5 excluded). As modeled in Chart 8,
it can thus be concluded that both face-to-face and movie conversa-
tion belong to the same text type, which is primarily interpersonal,
is primarily
affective, interpersonal, affective,
interactional, interactional,and
and generalized and generalized
secondly, and secondly,
non-narrative,
situation dependent,
non-narrative, and not highly
situation dependent, and notpersuasive.
highly persuasive.
Chart 8. Multi-Dimensional
Chart 8.Analysis results. analysis results
Multi-Dimensional
94
Chapter 4
Shot 2: Close-ups
4.1 Introduction
95
as abstract, whereas face-to-face conversation has a negative score and
is labeled as non-abstract. In spite of this difference, however, neither
of the two conversational types has a high mean score, which means
that the difference between them can be considered minimal.
The present section illustrates, through Multi-Dimensional Analy-
sis, that movie genre – specifically, comedy and non-comedy28 – does
not significantly influence the resemblance of movie conversation to
face-to-face conversation. The results of the Multi-Dimensional Analy-
sis illustrated in Table 23 reflect what has already emerged in the ana-
lysis described in Chapter 3, which did not take movie genre into
account: even though comedies are slightly more similar than non-
comedies to face-to-face conversation, both movie genres have four
Dimensions out of five in common with the latter.
96
sions 1, 2, 3, and 4, cf. bold in Table 23). Indeed, by excluding Dimen-
sion 5, which is the Dimension on which face-to-face and movie con-
versation mostly differ, and which neither comedies nor non-comedies
share with face-to-face conversation in terms of polarity, comedies are
closer to face-to-face conversation with regard to three Dimensions (i. e.
Dimensions 1, 2, and 4), whereas non-comedies are closer to it only with
respect to Dimension 3. The Dimension on which non-comedies differ
most from face-to-face conversation is Dimension 4. The following sec-
tions will offer specific details on the results concerning Dimensions 1–5.
Dimension 1 (+)
fname that_del contrac pres pro2 pro_do p-dem gen-emph pro1 it
comedies.txt 8.3 7.5 114.7 54.6 4.2 13.0 11.6 75.2 19.0
noncomedies.txt 8.7 5.0 120.4 52.5 4.1 11.7 8.0 72.3 20.5
fname be_state sub_cos prtcle pany gen_hdg amplifr wh-ques pos_mod o_and wh_cl finlprep
comedies.txt 2.7 1.2 8.7 7.1 1.9 2.7 5.6 7.1 11.3 2.4 4.4
noncomedies.txt 3.7 1.6 7.7 52.5 1.7 2.3 6.4 10.5 11.3 2.6 3.2
97
4.2.2 Narrative vs. Non-Narrative Concerns (Dimension 2)
Dimension 2 (-)
98
on it. More specifically, non-comedies have two linguistic features
(i. e. relative clauses in subject position and phrasal connectors, i. e. rel_subj
and p_and ) which carry a positive weight on Factor 3, whereas com-
edies have three: relative clause in object position, wh-pronouns that func-
tion as a relative clause in object position with prepositional fronting,
and nominalization, i. e. rel_obj, rel_pipe, and n_nom respectively. Fur-
thermore, non-comedies have two linguistic items which carry a nega-
tive weight (see labels in bold in Table 26) on Dimension 3 (i. e. time
and place adverbs, i. e. tm_adv and pl_adv), whereas comedies have only
one (adverbs, i. e. advs).
Dimension 3 (-)
99
Table 27. Linguistic features of Dimension 4 in AMC comedies and non-comedies.
Dimension 4 (+)
Dimension 5 (-)
100
4.2.6 Summary
31 The centrality to linguistic analysis of the mutual relations between the differ-
ent levels of language and of meaning as function in context arise from a tradi-
tion based on the pioneering work of John Rupert Firth and then developed by
the so-called new-Firthians – i. e. Michael Halliday and John Sinclair – and,
later, by contemporary scholars such as Biber, Francis and Hunston, Stubbs,
Tognini-Bonelli, inter alia.
102
32
Table
Table 29. Word lists 29.the
32 of Word lists
LSAC andof
thethe LSAC and the AMC
AMC.
LSAC AMC
N Word Per 1,000 W N Word Per 1,000 W
1 I 34.50 1 YOU 40.19
2 THE 29.76 2 I 32.07
3 YOU 27.59 3 THE 29.37
4 AND 26.59 4 TO 21.43
5 TO 23.50 5 A 21.12
6 IT 18.99 6 AND 14.34
7 THAT 18.20 7 THAT 13.27
8 A 18.14 8 IT 12.88
9 OF 12.52 9 OF 12.24
10 YEAH 10.90 10 IS 10.16
11 IN 10.69 11 IN 10.02
12 IS 10.33 12 ME 09.69
13 KNOW 09.68 13 WHAT 09.62
14 WAS 09.16 14 THIS 08.75
15 HAVE 08.77 15 NO 08.19
16 LIKE 08.58 16 I’M 07.77
17 SO 08.47 17 ON 07.71
18 WE 07.88 18 YOUR 07.15
19 OH 07.69 19 OH 07.11
20 THEY 07.59 20 HAVE 06.83
21 THIS 07.57 21 DO 06.77
22 IT’S 07.54 22 FOR 06.76
23 WHAT 07.27 23 DON'T 06.74
24 JUST 07.20 24 MY 06.70
25 DO 07.00 25 KNOW 06.69
26 BUT 06.94 26 WE 06.34
27 WELL 06.52 27 JUST 06.06
28 UH 06.44 28 IT’S 05.66
29 ON 06.35 29 NOT 05.59
30 HE 06.29 30 ALL 05.58
103
32
The numbers in the table are normalized to 1,000.
From a qualitative point of view, the analysis of the most frequent words
confirms the results of the Multi-Dimensional Analysis presented in the
previous chapter: both corpora are characterized by interpersonal com-
munication given that both of them display a frequent use of first person
pronouns and possessives (I, we), second person pronouns and possessives (you),
private verbs (know), of it pronouns (it), demonstrative pronouns (that,
this), coordinating conjunctions (and ), verb do, and wh-expressions (what).
104
Table 30. Two-grams present in the LSAC and in the AMC (indicated in bold)33.
33
The numbers in the table are normalized to 1,000.
This claim is further supported by the fact that the most frequent lexi-
cal bundle in the two corpora is you know, even though the occur-
rences in the LSAC are twice as frequent (5.32) as those present in
the AMC (2.83). This is thought-provoking, since you know is very
frequent in conversation (Kennedy 1998, Biber et al. 1999), as illus-
trated in Chapter 1, and is usually considered part of the core spoken
language (McCarthy 1999, Erman 2001); consequently, its high fre-
quency in both corpora makes movie conversation similar to face-to-
face conversation also along this parameter. Furthermore, despite this
numerical difference, empirical data have proven (cf. Forchini 2009,
2010) that the patterning of you know, in terms of its general distri-
bution and function, is extremely similar in the two conversational
domains. As a matter of fact, you know occurs homogeneously and
especially in mid-position in the turn; it occurs less in initial position
and rarely in final position; and it is usually employed when the speaker
recounts or comments on something, often providing new informa-
tion or information that may be unknown to the listener. Finally, it
is also interesting to note that another characteristic shared by face-
to-face and movie conversation is the frequent co-occurrence of you
know with other discourse markers, interjections, and inserts. In par-
ticular, in face-to-face conversation, you know principally occurs with
I mean in the clusters I mean you know and you know I mean (cf. Ta-
ble 31). The high frequency of you know suggests that this two-gram
probably belongs to a larger cluster, which may correspond to the pat-
tern discourse marker/insert + you know or you know + discourse marker/
insert.
106
Table 31. Clusters of you know in the LSAC34.
Table 31. Clusters of you know in the LSAC34
N Cluster Per 1,000 W
1 YOU KNOW WHAT I 0.11
2 YOU KNOW I MEAN 0.06
3 WHAT I MEAN 0.05
4 YOU KNOW YOU KNOW 0.04
5 DO YOU KNOW WHAT 0.04
6 I MEAN YOU KNOW 0.04
7 YOU KNOW I DON’T 0.04
8 I DON'T KNOW 0.03
9 YOU KNOW AND I 0.03
10 YOU KNOW I THINK 0.03
11 YOU KNOW AND THEN 0.03
12 YOU KNOW I WAS 0.02
13 YOU KNOW IF YOU 0.02
14 YOU KNOW IT’S LIKE 0.02
15 WELL YOU KNOW I 0.02
16 YOU KNOW IT WAS 0.02
17 YOU KNOW WHAT I’M 0.02
18 UH HUH YOU KNOW 0.02
19 YOU KNOW UH HUH 0.02
20 WELL YOU KNOW WHAT 0.02
21 BUT YOU KNOW WHAT 0.02
22 I SAID YOU KNOW 0.02
23 DO YOU KNOW WHERE 0.01
24 AND I SAID 0.01
25 YOU KNOW WHAT YOU 0.01
26 AND YOU KNOW I 0.01
27 AND YOU KNOW WHAT 0.01
28 YOU KNOW I JUST 0.01
29 YOU KNOW I KNOW 0.01
30 BUT YOU KNOW I 0.01
A similar pattern emerges from the movie dialogue corpus: even though
there are only 0.30 occurrences (per thousand words) of the cluster you know I
34 The numbers in the table are normalized to 1,000.
mean in the AMC, its frequent co-occurrence with other discourse markers,
107 (R1)
interjections and inserts is further confirmed by the left (L1) and right
collocates of you know in both corpora. These collocates are usually
expressions like uh, um, oh, well, like, yeah, just. In the corpus of face-to-face
A similar pattern emerges from the movie dialogue corpus: even though
there are only 0.30 occurrences (per thousand words) of the cluster
you know I mean in the AMC, its frequent co-occurrence with other
discourse markers, interjections and inserts is further confirmed by the
left (L1) and right (R1) collocates of you know in both corpora. These
collocatesthere
conversation, are usually
are more expressions like uh,than
R1 collocates oh, well, like,
in the just, as
AMC, but,illustrated
mean in
(cf. Tables 32 and 33). In the corpus of face-to-face conversation, there
Tablesare32more
and R133. collocates
This may simply
than inbe
thedue to a This
AMC. difference in the be
may simply corpora
due tosize or
a difference
to a difference in the corpora
in genre, meaningsize thatorinto
movies you know
a difference in genre, meaningonly to
might belong
the discourse marker/insert + you know and not to the you knowmarker/
that in movies you know might belong only to the discourse + discourse
insert + you know and not to the you know + discourse marker/insert
marker/insert
cluster, ascluster,
it doesas
in itface-to-face
does in face-to-face conversation.
conversation.
Table
Table 32. 32.
TheThe 10 most
10 most frequentL1L1and
frequent andR1 discourse marker/insert
R1discourse marker/insert collocates of you
collocates of you
know in the LSAC35. 35
know in the LSAC
Word L1 R1
WELL 1.70
BUT 1.50 0.50
LIKE 1.10 1.00
UH 0.90 0.40
SO 0.60 0.50
YEAH 0.50 0.40
UM 0.50
JUST 0.40
MEAN 0.40 0.40
OH 0.40
Table 33. L1 and R1 discourse marker/insert collocates of you know in the AMC36
108
Similarly,
Similarly,asasTable
Table34 34
illustrates, face-to-face
illustrates, and movie
face-to-face conversation
and movie conversation
have similar features also in terms of four-grams: although they do
have similar features also in terms of four-grams: although they do not share
not share many of them, there are some (such as I want you to and
manyyou of them,
want methere
to are
/do some
you want four-grams as I why
suchknow
me; do you wantand
youI to you want
andknow
don’t
why; I thought you were and I thought it was) which are clearly inter-
35 changeable and functionally similar. Together with other bundles which
The numbers in the table are normalized to 1,000.
36
The have personal
numbers pronouns
in the table (e. g. you
are normalized know and I don’t), for example, these
to 1,000.
four-grams highlight the interpersonal function which is typical of
spoken language.
109
Table 34. LSAC and AMC four-grams37 (similar and/or functionally similar four-
Table 34. LSAC and AMC four-grams37 (similar and/or functionally similar four-
grams in bold).
grams in bold)
LSAC AMC
N Word Per 1,000 W N Word Per 1,000 W
1 # WHITE YES # 0.30 1 I WANT YOU TO 0.23
2 I DON’T KNOW IF 0.19 2 WHAT ARE YOU DOING 0.23
3 I DON’T KNOW WHAT 0.18 3 I DON’T KNOW WHAT 0.20
4 COLLEGE # WHITE YES 0.18 4 NO NO NO NO 0.16
5 DO YOU WANT TO 0.16 5 WHAT DO YOU MEAN 0.15
6 I DON’T WANT TO 0.13 6 ARE YOU TALKING ABOUT 0.14
7 YOU DON’T HAVE TO 0.11 7 BOMB BOMB BOMB BOMB 0.13
8 BA BS COLLEGE # 0.11 8 YOU WANT ME TO 0.13
9 YOU KNOW WHAT I 0.10 9 WHAT DO YOU THINK 0.12
10 YOU WANT ME TO 0.10 10 COME ON COME ON 0.11
11 I DON’T KNOW I 0.10 11 WHAT ARE YOU TALKING 0.11
12 I WAS GOING TO 0.10 12 NICE TO MEET YOU 0.10
13 I DON’T KNOW HOW 13 ARE YOU DOING HERE
14 SOME COLLEGE # WHITE 14 WHAT THE HELL IS
15 IF YOU WANT TO 15 HOW DO YOU KNOW
16 BS COLLEGE # WHITE 16 THE HELL ARE YOU
17 I DON’T THINK SO 17 TO TALK TO YOU
18 OR SOMETHING LIKE THAT 18 YOU DON’T HAVE TO
19 ARE YOU GOING TO 19 A LONG TIME AGO
20 # # TO # 20 DO YOU KNOW WHY
21 WELL I DON’T KNOW 21 I DON’T HAVE A
22 AND I WAS LIKE 22 I DON’T WANT TO
23 I DON’T KNOW WHY 23 I JUST WANTED TO
24 MA MS # WHITE 24 I KNOW I KNOW
25 MS # WHITE YES 25 I THOUGHT YOU WERE
26 DO YOU WANT ME 26 IF I TOLD YOU
27 WHAT ARE YOU DOING 27 LET ME ASK YOU
28 BUT I DON’T KNOW 28 SO WHAT ARE YOU
29 #### 29 THE REST OF THE
30 I THOUGHT IT WAS 30 WHAT DO YOU WANT
Another similarity
Another emerges
similarity from
emerges thethe
from factfactthat most
that mostofofthe
thefour-grams
four-grams
shared by both corpora are verb phrase fragments, such as I don’t know know
shared by both corpora are verb phrase fragments, such as I don’t what,
what, you want me to, etc. (cf. Table 34) – which are, of course, typi-
you want me to, etc. (cf. Table 34) – which are, of course, typical of spoken
cal of spoken language (cf. Biber 2009).
language (cf. Biber 2009).
Table 35. Multi-word formulaic sequences composed of content words in the AMC.
111
Furthermore, a closer investigation of content words shows that in
spoken language the verbs “know, think (thought), and what occur as content
words in almost 50% of the conversational multi-word sequences that include a
that
content include
word a content
of any word of
kind” (Biber any kind”In(Biber
2009:300). much2009: 300).way,
the same In much
the most
the same way, the most frequent verbs in the multi-word sequences
frequent verbs in the multi-word sequences found in the AMC are know, want,
found in the AMC are know, want, talk (not mentioned by Biber),
talk (not
and mentioned by Biber),
think (thought). These and think and
patterns, (thought
their ).raw
These patterns,are
occurrences, and
il- their
lustrated in Table
raw occurrences, 36.
are illustrated in Table 36.
Table 38. Most Frequent Patterns with External and Internal Variable Slot (*) in the AMC
TOTAL TOTAL
TYPES 16 TYPES 6
TOKENS 346 TOKENS 113
75.38% 24.61%
4.3.4 Summary
114
4.3.4 Summary
115
– both movie and face-to-face conversation display continuous se-
quences of fixed elements, with a preceding or following variable
slot (e. g. 123*).
It may thus be concluded that, since the word lists, two-grams, four-
grams, and formulaic patterns that are typical of spoken language have
also been found in movie conversation, the two conversational domains
are significantly similar. On a functional level, this finding reflects the
interpersonal trait which emerged from the Multi-Dimensional Analy-
sis in Chapter 3.
116
Chapter 5
Closing Credits: Implications and Applications
117
models, and which may cause us to revise our ideas very substantially. In all of
this my plea is to trust the text (Sinclair 2004a: 23).
118
5.1.1 A Reflection of Spoken Language
119
– the most frequent words, two-grams, and four-grams;
– certain two-grams and four-grams which highlight the interpersonal
function brought to light by Multi-Dimensional Analysis;
– content words and function words in multi-word sequences which
are equally likely to be fixed;
– most of the four-word sequences composed of one content word
+ three function words;
– most of the frequent verbs in content word sequences;
– continuous sequences of fixed elements, with a preceding or follow-
ing variable slot.
On the whole, the functional interpretation of these specific features
reflects the results that emerged within the more general Multi-
Dimensional Analysis and especially highlights the importance of
Dimension 1: the most frequent words, two-grams and four-grams in
both face-to-face and movie conversation offer further proof of the
interpersonal function which characterizes these domains. Besides, their
common preference for functional words and the use of clauses, for
example, not only reflects the results of Multi-Dimensional Analysis,
which has shown that that-clauses, wh-clauses, causative adverbial
clauses, and conditional adverbial clauses (i. e. finite dependent clauses) are
characteristic of interpersonal spoken registers, but also demonstrates
that the grammar of movie and face-to-face conversation does not
differ.
120
help” (Mauranen 2004: 103). This is promising as regards the future
utility of the AMC or of other similar corpora. Such corpora are an
effective resource for teaching features of spoken language, as demon-
strated by experiments conducted with Italian university students study-
ing English (Forchini forthcoming). The students in question success-
fully acquired features of spoken discourse, such as elisions, blends, repetitions,
false starts, reformulations, discourse markers, and interjections, and were
also highly motivated by the use of movies.
In addition to the advantages of using movies with students, the
relative simplicity of accessing movie DVDs and transcribing movie
speech, compared to the complications of collecting spoken data (cf.
also Biber et al. 1999: 1041, McCarthy 1999: 13, Halliday 2005: 162),
should not be ignored. One disadvantage of spoken corpora, indeed, is
that they are laborious to compile (Halliday 2003c, Mauranen 2004).
First, representative speakers who agree to be recorded have to be found;
second, they have to be recorded in such a way that their recording can
then be easily accessed and heard; finally, these recordings need to be
transcribed in order to be searchable with corpus linguistic software.
Using movie language to study spoken features also offers a possible
solution to the problem of having authentic teaching material.
The present study has pointed out the linguistic resemblance of movie
conversation to face-to-face conversation and confutes the claim that
movie language has “a very limited value” and that is “not likely to
be representative of the general usage of conversation” (cf. Sinclair
2004b: 80).
It cannot be ignored that the motion-picture medium imposes cer-
tain non-spontaneous traits on movie language, limiting it in terms of
total spontaneity, but in this study I have shown that the current view
of movie dialogue as being non-representative of the general usage of
121
conversation needs to be re-considered. The surprising results of this
empirical study are in direct contrast with what has been proclaimed in
the literature for thirty years and require a revision of both the notion
that movie language has limited value and the label artificial, which
scholars have often given to this type of conversational domain.
The main implication of the striking similarity between the com-
municative functions and linguistic features characterizing face-to-face
and movie conversation is that, from a theoretical point of view, it
legitimates the use of movie language to teach features of spoken lan-
guage. On the practical side, given the relative ease of collecting movie
conversation material compared to collecting authentic spoken mate-
rial, the use of movie language as a source of teaching spoken features
becomes appealing. Lastly, there is the universal attraction that movies
hold for viewers, which should not be underestimated.
In terms of future directions, it could be interesting to investigate
movies further to see whether other genre categories produce different
results. The study could also be broadened out to include and compare
other varieties of English. The advantages of exploring real movie data
and the considerable potential that movie language offers for the study
of the spoken language, however, have been amply exemplified.
122
Appendices
Appendices
Appendix 1. Linguistic features codes (Multi-Dimensional analysis)
Appendix 1.
Linguistic Features Codes (Multi-Dimensional Analysis)
123
{Negative Dimension 1}
22 = n Noun
23 = prep Preposition
24 = adj_attr Attributive Adjective
{Dimension 2}
25 = pasttnse Past Tense Verb
26 = pro3 Third person pronoun (except ‘it’)
27 = perfects Verb – Perfect Aspect
28 = pub_vb Public Verbs (e. g. assert, complain, say)
{Dimension 3}
29 = rel_obj Wh pronoun – relative clause – object position
30 = rel_subj Wh pronoun – relative clause – subject position
Wh pronoun – relative clause – object position with prepositional
31 = rel_pipe
fronting (‘pied piping’)
32 = p_and Coordinating conjunction – phrasal connector
33 = n_nom Singular noun –nominalization
34 = tm_adv Adverb – Time
35 = pl_adv Adverb – Place
36 = advs Adverb (not including counts 8, 15, 16, 34, 35, 49)
{Dimension 4}
37 = inf Infinitive Verb
38 = prd_mod Modal of prediction (will, would, shall)
39 = sua_vb Suasive Verb (e. g. ask, command, insist)
40 = sub_cnd Subordinating conjunction – conditional (e. g. if, unless)
41 = nec_mod Modal of necessity (ought, should, must)
42 = spl_aux Adverb within auxiliary (splitting aux-verb)
{Dimension 5}
43 = conjncts Adverbial – conjuncts (e. g. however, therefore, thus)
44 = agls_psv Agentless passive verb
45 = by_pasv Passive verb + by
46 = whiz_vbn Passive postnominal modifier
47 = sub_othr Subordinating conjunction – Other (e. g. as, except, until)
124
Appendix 2. Face-to-Face Conversation Means Procedure
Appendix 2. Face-to-face conversation means procedure (Multi-Dimensional analysis)38
(Multi-Dimensional Analysis)38
38
Means per 1,000 words. 125
126
Appendix 3. Movie Conversation Means Procedure 39
Appendix 3. Movie conversation means procedure (Multi-Dimensional analysis)
(Multi-Dimensional Analysis)39
129
40
Means per 1,000 words.
130
Appendix 5.
Multi-Dimensional Analysis of Borderline Genre Movies
131
132
References
Aarts, Bas. 2001. Corpus linguistics, Chomsky and fuzzy tree fragments. In Corpus
linguistics and linguistic theory, ed. Christian Mair and Marianne Hundt.
Amsterdam / Atlanta: Rodopi, 5–13.
Aijmer, Karin and Anna-Brita Stenström. 2005. Approaches to spoken interaction.
Journal of Pragmatics 37: 1743–1751.
Atkinson, Dwight. 2001. Scientific discourse across history: A combined multi-
dimensional / rhetorical analysis of the philosophical transactions of the Royal
Society of London. In Variation in English: Multi-dimensional studies, ed. Susan
Conrad and Douglas Biber. London: Longman, 45–65.
Austin, John Langshaw. 1962. How to do things with words. Oxford: Clarendon Press.
Baccolini, Raffaella and Rosa Maria Bollettieri Bosinelli, eds. 1994. Il doppiaggio:
trasposizioni linguistiche e culturali. Bologna: CLUEB.
Bazzanella, Carla. 1990. Phatic connectives as intonational cues in contemporary
spoken Italian. Journal of Pragmatics 14(4): 629–647.
Bazzanella, Carla. 1999. Forme di ripetizione e processi di comprensione nella con-
versazione. In La conversazione. Un’introduzione allo studio della conversazione
verbale, ed. Renata Galattolo and Gabriele Pallotti. Milano: Raffello Cortina
Editore, 205–225.
Bercelli, Fabrizio. 1999. Analisi conversazionale e analisi dei frame. In La conversa-
zione. Un’introduzione allo studio della conversazione verbale, ed. Renata Galat-
tolo and Gabriele Pallotti. Milano: Raffello Cortina Editore, 89–118.
Bettetini, Gianfranco. 2002 (4th edition). La conversazione audiovisiva. Problemi
dell’enunciazione filmica e televisiva. Milano: Studi Bompiani.
Bernardini, Silvia. 2004. Corpora in the classroom: An overview and some re-
flections on future developments. In How to use corpora in language teaching,
ed. John McHardy Sinclair. Amsterdam / Philadelphia: John Benjamins Long-
man, 15–36.
Biber, Douglas. 1985. Investigating macroscopic textual variation through multi-
feature / multi-dimensional analyses. Linguistics 23: 337–60.
Biber, Douglas. 1988. Variation across speech and writing. Cambridge: Cambridge
University Press.
Biber, Douglas. 1993. Representativeness in corpus design. Literary and Linguistics
Computing 8(4): 243–57.
Biber, Douglas. 1995. Dimensions of register variation: A cross-linguistic comparison.
Cambridge: Cambridge University Press.
133
Biber, Douglas. 2004. Conversation text types: A multi-dimensional analysis. 7es
Journées internationales d’Analyse statistique des Données Textuelles JADT’04,
<http://www.cavi.univ-paris3.fr/lexicometrica/jadt/jadt2004/pdf/JADT_000.
pdf>. Last accessed on 27/07/2011.
Biber, Douglas. 2006. University language: A corpus-based study of spoken and writ-
ten registers. Amsterdam / Philadelphia: John Benjamins.
Biber, Douglas. 2009. A corpus-driven approach to formulaic language in English.
Multi-word patterns in speech and writing. International Journal of Corpus Lin-
guistics 14(3): 275–311.
Biber, Douglas and Edward Finegan. 1986. An initial typology of English texts. In
New studies in the analysis and exploitation of computer corpora, ed. Jan Aarts
and Eijs Willem. Amsterdam: Rodopi, 19– 45.
Biber, Douglas and Edward Finegan. 2001a. Diachronic relations among speech-
based and written registers. In Variation in English: Multi-dimensional studies,
ed. Susan Conrad and Douglas Biber, 66–83. London: Longman.
Biber, Douglas and Edward Finegan. 2001b. Intra-textual variation within medi-
cal research articles. In Variation in English: Multi-dimensional studies, ed. Susan
Conrad and Douglas Biber. London: Longman, 108–123.
Biber, Douglas, Susan Conrad, and Randi Reppen. 1998. Corpus linguistics: inves-
tigating language structure and use. Cambridge: Cambridge University Press.
Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad, and Edward Finegan.
1999. Longman grammar of spoken and written English. London: Longman.
Bollettieri Bosinelli, Rosa Maria (ed). 1998. La traduzione multimediale: quale tradu-
zione per quale testo? Atti del convegno internazionale: La traduzione multi-
mediale. Bologna: CLUEB.
Börjas, Kersti. 2006. Description and theory. In The handbook of English linguis-
tics, ed. Bas Aarts and April McMahon. Malden: Blackwell, 9–32.
Brown, Penelope and Stephen Levinson. 1987. Politeness: Some universals in lan-
guage usage. Cambridge: Cambridge University Press.
Bruti, Silvia. 2006. Cross-cultural pragmatics: the translation of implicit compli-
ments in subtitles. JoSTrans, Issue 06, <http://www.jostrans.org/issue06/art_
bruti.php>. Last accessed on 27/07/2011.
Bruti, Silvia and Elisa Perego. 2005. Translating the expressive function in subtitles:
the case of vocatives. In Research on translation for subtitling in Spain and Italy,
ed. John D. Sanderson. Alicante: Publicaciones de la Universidad de Alicante,
27– 48.
Bubel, Claudia. 2008. Film audience as overhearers. Journal of Pragmatics 40: 55–71.
Cameron, Deborah. 2001. Working with spoken discourse. London: Sage Publica-
tions Ltd.
Carter, Ronald and Michael McCarthy. 2006. Cambridge grammar of English: A
comprehensive guide. Spoken and written English: Grammar and usage. Cam-
bridge: Cambridge University Press.
134
Cattrysse, Patrick. 2001. Multimedia & translation: Methodological considerations.
In (Multi)Media translation. Concepts, practices and research, ed. Henrik Gottlieb
and Yves Gambier, 1–12. Amsterdam / Philadelphia: John Benjamins.
Chafe, Wallace. 1982. Integration and involvement in speaking, writing, and oral
Literature. In Spoken and written Language: Exploring orality and literacy,
ed. Deborah Tannen. Norwood / New Jersey: Ablex Publishing Corporation,
35–53.
Chaume, Frederic. 2004a. Cine y traucción. Catedra: Signo e Imagen.
Chaume, Frederic. 2004b. Discourse markers in audiovisual translating. Meta XLIX(4):
843–855, <http://www.erudit.org/revue/meta/2004/v49/n4/009785ar.pdf>.
Last accessed on 27/07/2011.
Chaume, Frederic. 2004c. Film studies and translation studies: Two disciplines at
stake in audiovisual translation. Meta XLIX(1): 12–24, <http://www.erudit.
org/revue/meta/2004/v49/n1/009016ar.pdf>. Last accessed on 27/07/2011.
Chomsky, Noam. 1957. Syntactic structures. Berlin: Mouton de Gruyter.
Chomsky, Noam. 1965. Aspects of the theory of syntax. Cambridge: The MIT Press.
Clark, Herbert H. and Edward F. Schaefer. 1992. Dealing with overhearers. In arenas
of language use, ed. Clark Herbert. Chicago: University of Chicago Press, 248–
273.
Conrad, Susan. 2001. Variation among disciplinary texts: a comparison of texts
about American nuclear arms policy. In Variation in English: Multi-dimensional
studies, ed. Susan Conrad and Douglas Biber, 84–93. London: Longman.
Contento, Silvana. 1999. Attività Bimodale: aspetti verbali e gestuali della comunica-
zione. In La conversazione. Un’introduzione allo studio della conversazione
verbale, ed. Renata Galattolo and Gabriele Pallotti. Milano: Raffello Cortina
Editore, 267–286.
Erman, Britt. 1987. Pragmatic Expressions in English: A study of you know, you see,
and I mean in face-to-face conversation. Stockholm: Almqvist & Wiksell Inter-
national.
Firth, John Rupert. 1935a. The technique of semantics. Transactions of the Philologi-
cal Society 36–72.
Firth, John Rupert. 1935b. The use and distribution of certain English sounds. Eng-
lish Studies xvii(I): 8–18.
Firth, John Rupert. 1951a. General linguistics and descriptive grammar. Transactions
of the Philological Society 216–228.
Firth, John Rupert. 1951b. Modes of meaning. In Essays and Studies, ed. John
Rupert Firth. English Association, 118–149.
Firth, John Rupert. 1957a. A synopsis of linguistic theory, 1930–1955. In Studies
in Linguistic Analysis, ed. John Rupert Firth et al. Special volume of the Philo-
logical Society. Oxford: Blackwell, 1–32.
Firth, John Rupert. 1957b. Papers in linguistics 1934–1951. London: Oxford Uni-
versity Press.
135
Forchini, Pierfranca. 2009. Spontaneity reloaded: American face-to-face and movie
conversation compared. Corpus Linguistics 2009 [Liverpool, Luglio 21–23 2009]
online proceedings <http://ucrel.lancs.ac.uk/publications/cl2009/>. Last ac-
cessed on 27/07/2011.
Forchini, Pierfranca. 2010. Well, uh no. I mean, you know. Discourse markers in
movie conversation. In Perspectives on Audiovisual Translation, ed. Lukasz
Bogucki. Bern: Peter Lang, 45–59.
Forchini, Pierfranca. Forthcoming. Movie conversation: a reflection of face-to-face
conversation and a source for teaching spoken language. In Papers from the
XXIV AIA Conference Proceedings, eds. Gabriella Di Martino, Linda Lombardo,
Silvia Nuccorini, Edizioni Q: Roma.
Forchini, Pierfranca and Murphy, Amanda. 2010. 4-grams in comparable special-
ized corpora: perspectives on phraseology, translation, and pedagogy. In Pat-
terns, meaningful units and specialized discourses, eds. Ute Römer, Rainer
Schulze. Amsterdam / Philadelphia: Benjamins Current Topics, 87–103.
Ford, Cecilia A. and Barbara A. Fox, Sandra A. Thompson (ed.). 2002. The lan-
guage of turn and sequence. Oxford: Oxford University Press.
Francis, Gill. 1993. A corpus-driven approach to grammar: principles, methods and
examples. In Text and technology. In honour of John Sinclair, ed. Mona Baker,
Gill Francis and Elena Tognini-Bonelli. Amsterdam / Philadelphia: John Benja-
mins, 137–156.
Gavioli, Laura. 1999. Alcuni meccanismi di base dell’analisi della conversazione.
In La conversazione. Un’introduzione allo studio della conversazione verbale, ed.
Renata Galattolo and Gabriele Pallotti. Milano: Raffello Cortina Editore, 43–
66.
Goffman, Erving. 1976. Replies and responses. Language in Society 5: 257–313.
Goffman, Erving. 1979. Footing. Semiotica 25: 1–29.
Gottlieb, Henrik and Yves Gambier, eds. 2001. Multi-media translation: concepts,
practices, and research. Amsterdam / Philadelphia: John Benjamins.
Gregory, Michael and Suzanne Carroll. 1978. Language and situation: Language va-
rieties and their social contexts. London: Routledge & Kegan Paul.
Grice, Paul Herbert. 1975. Logic and Conversation. In Speech acts (Syntax and Se-
mantics, Vol. 3), ed. Peter Cole and Jerry L. Morgan. New York: Academic
Press, 41–58.
Halliday, Michael Alexander Kirkwood. 1985a. An introduction to functional gram-
mar. London: Arnold.
Halliday, Michael Alexander Kirkwood. 1985b. Spoken and written language. Ox-
ford: Oxford University Press.
Halliday, Michael Alexander Kirkwood. 1987. Spoken and Written Modes of Mean-
ing. In Comprehending Oral and written language, ed. Rosalind Horowitz and
Jay Samuels. Orlando: Academic Press, 55–82.
136
Halliday, Michael Alexander Kirkwood. 1993. Quantitative studies and probabili-
ties in grammar. In Data, description, discourse. Papers on the English language
in honour of John Sinclair, ed. Michael Hoey. London: HarperCollins, 1–25.
Halliday, Michael Alexander Kirkwood. 2003a. Introduction: On the “architecture”
of human language. In On language and linguistics, ed. Jonathan J. Webster.
London / New York: Continuum, 1–32.
Halliday, Michael Alexander Kirkwood. 2003b (first printed in 1985). Systemic
background. In On language and linguistics, ed. Jonathan J. Webster. London /
New York: Continuum, 185–198.
Halliday, Michael Alexander Kirkwood. 2003c (first printed in 1992). Systemic
grammar and the concept of a “Science of Language”. In On language and
linguistics, ed. Jonathan J. Webster. London / New York: Continuum, 199–
212.
Halliday, Michael Alexander Kirkwood. 2005 (first printed in 2002). The spoken
language corpus: a foundation for grammatical theory. In Computational and
quantitative studies, ed. Jonathan J. Webster. London / New York: Continuum,
157–190.
Helt, Marie E. 2001. A multi-dimensional comparison of British and American
spoken English. In Variation in English: Multi-dimensional studies, ed. Susan
Conrad and Douglas Biber. London: Longman, 171–183.
Higgins, John. 1991. Looking for patterns. In Classroom concordancing, ed. Tim
Johns and Philip King. Birmingham: Birmingham University Press, 4: 63–70.
Hoffmann, Sebastian. 2004. Are low-frequency complex prepositions grammatical-
ized? On the limits of corpus-data – and the importance of intuition. In Corpus
approaches to grammaticalization in English, ed. Hans Lindquist and Christian
Mair. Amsterdam / Philadelphia: John Benjamins, 171–210.
Hunston, Susan. 2002. Corpora in applied linguistics. Cambridge: Cambridge Uni-
versity Press.
Hunston, Susan. 2006. Phraseology and system: A contribution to the debate. In
System and Corpus: Exploring Connections, ed. Susan Hunston and Geoff
Thompson. London: Equinox Publishing, 55–80.
Johansson, Stig. 1993. Some aspects of the recommendations of the Text Encod-
ing Initiative, with special reference to the encoding of language corpora. In
Corpora Across Centuries, ed. Merja Kytö, Susan Wright, and Matti Rissanen.
Amsterdam / Atlanta: Rodopi, 203–210.
Johansson, Stig. 2007. Seeing through multilingual corpora. In Corpus linguistics
25 years on, ed. Roberta Facchinetti. Amsterdam / New York: Rodopi, 51–72.
Kennedy, Graeme. 1998. An introduction to corpus linguistics. London / New York:
Longman.
Mahlberg, Michaela. 2006. But it will take time… points of view on a lexical gram-
mar of English. In The changing faces of corpus linguistics, ed. Antoinette Renouf
and Andrew Kehoe. Amsterdam / New York: Rodopi, 377–390.
137
Mansfield, Gillian. 2006. Changing channels. Media language in (inter)action.
Milano: LED.
Mauranen, Anna. 2004. Spoken corpus for an ordinary learner. In How to use cor-
pora in language teaching, ed. John McHardy Sinclair. Amsterdam / Philadel-
phia: John Benjamins Longman, 89–105.
May, Renato. 1962. Cinema e linguaggio. Brescia: La Scuola Editrice.
McCarthy, Michael. 1998. Spoken language and applied linguistics. Cambridge: Cam-
bridge University Press.
McCarthy, Michael. 1999. What constitutes a basic vocabulary for spoken com-
munication? Studies in English language and literature 1: 233–249.
McEnery, Tony and Andrew Wilson. 1996. Corpus linguistics. Edinburgh: Edin-
burgh University Press.
Miller, Jim. 2006. Spoken and Written English. In The handbook of English linguistics,
ed. Bas Aarts and April McMahon. Malden / Oxford: Blackwell, 670–691.
Miller, Jim and Regina Weinert. 1998. Spontaneous spoken language. Oxford:
Clarendon.
Menarini, Alberto. 1955. Il cinema nella lingua la lingua del cinema. Milano / Roma:
Fratelli Bocca Editori.
Morandini, Laura, Morandini, Luisa, Morandini Morando. 2006. Il Morandini
2007: Dizionari dei Film. Bologna: Zanichelli.
Nencioni, Giovanni. 1976. Parlato-parlato, parlato-scritto, parlato-recitato. Stru-
menti linguistici, 29: 1–56.
Nencioni, Giovanni. 1983. Di scritto e di parlato. Discorsi linguistici. Bologna: Zani-
chelli.
Partington, Alan. 1998. Patterns and meanings. Using corpora for English language
research. Amsterdam / Philadelphia: John Benjamins.
Pavesi, Maria. 1994. Osservazioni sulla linguistica del doppiaggio. In Il doppiaggio:
trasposizioni linguistiche e culturali, ed. Raffaella Baccolini and Rosa M. Bol-
lettieri Bosinelli. Bologna: CLUEB, 129–142.
Pavesi, Maria. 1996. L’allocuzione nel doppiaggio dall’inglese all’italiano. In Tradu-
zione multimediale per il cinema, la televisione e la scena. Atti del convegno inter-
nazionale (Forlì, October 26–28 1995). ed. Christine Heiss, Rosa M. Bollettieri
Bosinelli, 117–130. Bologna: CLUEB.
Pavesi, Maria. 2005. La Traduzione Filmica. Aspetti del parlato doppiato dall’inglese
all’italiano. Roma: Carocci.
Pavesi, Maria and Annalisa Malinverno. 2000. Sul turpiloquio nella traduzione
filmica. In Tradurre il cinema, ed. Christopher Taylor. Trieste: La Stea, 75–90.
Quaglio, Paulo. 2009. Television Dialogue. The sitcom Friends vs. natural conversa-
tion. Amsterdam / Philadelphia: John Benjamins.
Quaglio, Paulo and Douglas Biber. 2006. The grammar of conversation. In The
handbook of English linguistics, ed. Bas Aarts and April McMahon, 692–723.
Malden / Oxford: Blackwell.
138
Redeker, Gisela. 2006. Discourse markers as attentional cues at discourse transi-
tions. In Approaches to discourse particles, ed. Kirsten Fischer. Amsterdam:
Elsevier, 339–358.
Remael, Aline. 2001. Some thoughts on the study of multimodal, and multimedia
translation. In (Multi)Media Translation. Concepts, practices and research, ed.
Henrik Gottlieb and Yves Gambier. Amsterdam / Philadelphia: John Benjamins,
13–22.
Renouf, Antoinette. 1997. Teaching corpus linguistics to teachers of English. In
Teaching and language corpora, ed. Anne Wichmann, Steven Fligelstone, Tony
McEnery, and Gerry Knowles. London / New York: Longman, 255–266.
Renouf, Antoinette. 2007. Corpus linguistics 25 years on: from super-corpus to
cyber-corpus. In Corpus linguistics 25 years on, ed. Roberta Facchinetti. Amster-
dam / New York: Rodopi, 27–50.
Reppen, Randi. 2001a. Register variation in student and adult speech and writ-
ing. In Variation in English: Multi-dimensional studies, ed. Susan Conrad and
Douglas Biber. London: Longman, 187–199.
Reppen, Randi. 2001b. Review of MonoConc Pro and WordSmith Tools. Language
Learning & Technology 5(3): 32–36.
Reppen, Randi. 2010. Using Corpora in the Language Classroom. Cambridge: Cam-
bridge Language Education.
Rey, Jennifer M. 2001. Changing gender roles in popular culture: Dialogue in Star
Trek episodes from 1966 to 1993. In Variation in English: Multi-dimensional
studies, ed. Susan Conrad and Douglas Biber, 138–155. London: Longman.
Rossi, Alessandra. 2003. La lingua del cinema. In La lingua italiana e i mass me-
dia, ed. Ilaria Bonomi, Andrea Masini and Silvia Morgana. Roma: Carocci
Editore, 93–126.
Sacks, Harvey. 1992. Lectures on conversation. Oxford: Blackwell.
Schiffrin, Deborah. 1987. Discourse markers. Cambridge: Cambridge University
Press.
Scott, Mike. 1998. WordSmith Tools. Oxford: Oxford University. <http://www.lexically.
net/wordsmith/step_by_step/index.html>. Last accessed on 27/07/2011.
Scott, Mike and Chris Tribble. 2006. Textual patterns. Amsterdam / Philadelphia:
John Benjamins.
Searle, John R. 1969. Speech acts. Cambridge: Cambridge University Press.
Sinclair, John McHardy. 1991. Corpus concordance collocation. Oxford: Oxford Uni-
versity Press.
Sinclair, John McHardy. 1996. The search for units of meaning. Textus IX: 75–106.
Sinclair, John McHardy. 1999. A way with common words. In Out of corpora: studies
in honour of Stig Johansson, ed. Hilde Hasselgård and Signe Oksefjell. Amster-
dam: Rodopi, 157–179.
Sinclair, John McHardy. 2004a. Trust the text: Language, corpus and discourse. Lon-
don / New York: Routledge.
139
Sinclair, John McHardy. 2004b (first printed in 1987). Corpus creation. In Corpus
linguistics: Readings in a widening discipline, ed. Geoffrey Sampson and Diana
McCarthy. London / New York: Continuum, 78–84.
Sinclair, John McHardy. 2004c. How to use corpora in language teaching. Amster-
dam / Philadelphia: John Benjamins.
Sinclair, John McHardy. 2006. The case for a Corpus. Seminar given for the De-
partment of Foreign Languages and Literatures, Università Cattolica del Sacro
Cuore, Milan.
Stame, Stefania. 1999. I marcatori della conversazione. In La conversazione. Un’intro-
duzione allo studio della conversazione verbale, ed. Renata Galattolo and Gabriele
Pallotti. Milano: Raffello Cortina Editore, 169–186.
Stern, Karen. 2005. The Longman Spoken American Corpus: providing an in-depth
analysis of everyday English, Pearson Longman, <http://www.pearsonlongman.
com/dictionaries/pdfs/Spoken-American.pdf>.
Stubbs, Michael. 1996. Text and corpus analysis: Computer assisted studies of lan-
guage and institutions. Oxford / Massachusetts: Blackwell.
Stubbs, Michael. 2001. Words and phrases: Corpus studies in lexical semantics. Ox-
ford / Massachusetts: Blackwell.
Stubbs, Michael. 2007. An example of frequent English phraseology: distributions,
structures and functions. In Corpus linguistics 25 years on, ed. Roberta
Facchinetti. Amsterdam / New York: Rodopi, 89–106.
Svartvik, Jan. 2007. Corpus linguistics 25 years on. In Corpus linguistics 25 years
on, ed. Roberta Facchinetti. Amsterdam / New York: Rodopi, 11–26.
Tannen, Deborah. 1982. The oral / literate continuum in discourse. In Spoken and
written language: Exploring orality and literacy, ed. Deborah Tannen, 1–16.
Norwood / New Jersey: Ablex Publishing Corporation.
Taylor, Christopher. 1999. Look who’s talking. An analysis of film dialogue as a
variety of spoken discourse. In Massed medias. Linguistic tools for interpreting
media discourse, ed. Linda Lombardo, Louann Haarman, John Morley, and
Christopher Taylor. Milano: LED, 247–278.
Taylor, Christopher. 2000a. In Defence of the Word: Subtitles as Conveyors of
Meaning and Guardians of Culture. In La traduzione multimediale. Quale tra-
duzione per quale testo?, ed. Rosa M. Bollettieri Bosinelli, Christine Heiss,
Marcello Soffritti, and Silvia Bernardini. Bologna: CLUEB, 153–166.
Taylor, Christopher, ed. 2000b. Tradurre il cinema. Atti del convegno organizzato
da G. Soria e C. Taylor 29–30 novembre 1996. Trieste: Università degli Studi
di Trieste.
Taylor, Christopher. 2000c. The subtitling of film; reaching another community.
In Discourse and community; doing functional linguistics, ed. Eija Ventola. Tü-
bingen: Gunter Narr Verlag, 309–330.
Taylor, Christopher. 2003. Multimodal transcription in the analysis, translation and
subtitling of Italian films. The Translator, Special Issue, 9(2): 191–208.
140
Taylor, Christopher and Anthony Baldry. 2004. Multimodal concordancing and
subtitles with MCA. In Corpora and discourse, ed. Alan Partington, John
Morley, and Louann Haarman. Bern: Peter Lang, 57–70.
Thomas, Jenny. 1995. Meaning in interaction: An introduction to pragmatics. Lon-
don: Longman.
Tognini-Bonelli, Elena. 2001. Corpus linguistics at work. Amsterdam / Philadelphia:
John Benjamins.
Ulrych, Margherita. 1999. Focus on the translator in a multidisciplinary perspective.
Padova: Unipress.
Wichmann, Anne. 2007. Corpora and spoken discourse. In Corpus linguistics 25 years
on, ed. Roberta Facchinetti. Amsterdam / New York: Rodopi, 73–88.
Wynne, Martin. 2004. Developing linguistic corpora: A guide to good practice. Lon-
don: AHDS. <http://www.ahds.ac.uk/creating/guides/linguistic-corpora/
chapter6.htm>. Last accessed on 27/07/2011.
141
Romanek, Mark. 2002. One hour photo. Fox Searchlight Pictures.
Soderbergh, Steven. 2000. Erin Brockovich. Universal Pictures and Columbia Pic-
tures.
Soderbergh, Steven. 2001. Ocean’s eleven. Warner Bros.
Van Sant, Gustav. 2000. Finding Forrester. Columbia Pictures.
Wachowsky, Andrew Paul and Laurence Wachowsky. 2003. The matrix reloaded.
Warner Bros.
Woo, J. 2000. Mission: Impossible II. Paramount Pictures and United International
Pictures.
142
LINGUE E CULTURE
Languages and Cultures – Langues et Cultures
This series, edited by the Department of Language Sciences and Foreign Literatures of the
Università Cattolica del Sacro Cuore in Milan, intends to publish scholarly reflections on the
languages and literatures taught within this Languages and Literatures Faculty.
The series is rooted in a tradition of studies which are both philologico-literary and linguistic
– a combination of approaches designed to be both rigorous and complementary. The
themes of the series will focus on linguistic, stylistic and literary studies related to both
European and extra-European cultures. The series will include mostly monographs and
doctoral thesis.