Movie Language Revisited: Lingue E Culture

LINGUE E CULTURE
Languages and Cultures – Langues et Cultures
This book explores the linguistic nature of American movie conversation,
ini
pointing out its resemblances to face-to-face conversation. The reason for
franca Forch
Pier
such an investigation lies in the fact that movie language is traditionally
considered to be non-representative of spontaneous language. The book
presents a corpus-driven study of the similarities between face-to-face
Pierfranca Forchini Movie Language Revisited

and movie conversation, using detailed consideration of individual lexical
phrases and linguistic features as well as Biber’s Multi-Dimensional Analy-
sis (1998). The data from an existing spoken American English corpus - the
Longman Spoken American Corpus - is compared to the American Movie
Corpus, a corpus of American movie conversation purposely built for the
research. On the basis of evidence from these corpora, the book shows that
contemporary movie conversation does not differ significantly from face-to-
face conversation, and can therefore be legitimately used to study and
teach natural spoken language.
Pierfranca Forchini has an MA in Foreign Languages and Literatures, an

MA in Theoretical and Applied Linguistics, and a PhD in Linguistic and
Literary Sciences. Her interests are the lexico-grammar interface of spoken
and movie language, American English phraseology and phonology, corpus
linguistics, contrastive linguistics and audio-visual translation. She cur-
rently lectures in English Linguistics at Università Cattolica del Sacro Cuore,
Milan, Italy.
Movie Language Revisited

Evidence from
Multi-Dimensional Analysis and Corpora
ISBN 978-3-0343-1076-5
01
Peter Lang
www.peterlang.com
LINGUE E CULTURE
This book explores the linguistic nature of American movie conversation,
ini
pointing out its resemblances to face-to-face conversation. The reason for
franca Forch
Pier
such an investigation lies in the fact that movie language is traditionally
considered to be non-representative of spontaneous language. The book
presents a corpus-driven study of the similarities between face-to-face
Pierfranca Forchini Movie Language Revisited

and movie conversation, using detailed consideration of individual lexical
phrases and linguistic features as well as Biber’s Multi-Dimensional Analy-
sis (1998). The data from an existing spoken American English corpus - the
Longman Spoken American Corpus - is compared to the American Movie
Corpus, a corpus of American movie conversation purposely built for the
research. On the basis of evidence from these corpora, the book shows that
contemporary movie conversation does not differ significantly from face-to-
face conversation, and can therefore be legitimately used to study and
teach natural spoken language.
Pierfranca Forchini has an MA in Foreign Languages and Literatures, an

MA in Theoretical and Applied Linguistics, and a PhD in Linguistic and
Literary Sciences. Her interests are the lexico-grammar interface of spoken
and movie language, American English phraseology and phonology, corpus
linguistics, contrastive linguistics and audio-visual translation. She cur-
rently lectures in English Linguistics at Università Cattolica del Sacro Cuore,
Milan, Italy.

Evidence from
01
Peter Lang
LINGUE E CULTURE
01
Collana diretta da / Series edited by / Collection dirigée par
Marisa Verna
Giovanni Gobber
Pierfranca Forchini

Evidence from
PETER LANG
Bern · Berlin · Bruxelles · Frankfurt am Main · New York · Oxford · Wien
Bibliographic information published by die Deutsche Nationalbibliothek
Die Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie;
detailed bibliographic data is available on the Internet at ‹http://dnb.d-nb.de›.
Library of Congress Cataloging-in-Publication Data
Forchini, Pierfranca.
Movie language revisited : evidence from multi-dimensional analysis and corpora /
Pierfranca Forchini.
p. cm. -- (Lingue e culture (languages and cultures - langues et cultures), v. 1)
Includes bibliographical references.
ISBN 978-3-03-431076-5
1. Conversation analysis. 2. Motion pictures. 3. Language and literature. I. Title.
P95.45.F67 2012
791.4301’41--dc23
2011047258
Cover Design : Didier Studer, Peter Lang AG
EISBN 9783035103250
ISBN 9783034310765
© Peter Lang AG, International Academic Publishers, Bern 2012
Hochfeldstrasse 32, CH-3012 Bern, Switzerland
info@peterlang.com, www.peterlang.com
All rights reserved.

All parts of this publication are protected by copyright.
Any utilisation outside the strict limits of the copyright law, without the permission of
the publisher, is forbidden and liable to prosecution.
This applies in particular to reproductions, translations, microfilming, and storage and
processing in electronic retrieval systems.
Printed in Switzerland
For my mom,
Mariateresa Negrinotti
Acknowledgments
This book would not have come into being without the precious help
and support from many people. First and foremost, I would like to
express my profound gratitude to the members of the Department of
Scienze Linguistiche e Letterature Straniere at Università Cattolica del
Sacro Cuore (Milan, Italy) for providing me with an intellectually
stimulating environment. Special thanks are due to Marisa Verna,
Head of the Department, and to Margherita Ulrych, my dissertation
supervisor. I first became interested in the comparison between movie
and face-to-face conversation during my PhD dissertation research and
I am extremely grateful for her guidance and encouragement in pur-
suing this research. My heartfelt thanks also goes to the two anony-
mous referees, who made detailed and helpful comments, and to
Amanda Murphy, who read the manuscript and gave me incredible
support.
I would also like to thank Northern Arizona University (USA). I am
immensely grateful to Douglas Biber, who wrote the preface for
this book, and to Randi Reppen for offering time, collaboration, and
friendship. Douglas Biber’s ideas, methodology and meticulous com-
ments had a major influence on this book.
My sincere thanks also to John Sinclair, who is no longer with us,
but whose ideas and personality remain illuminating.
Last, but not least, my deepest gratitude and thanks to my extended
family for never-ending love and support.
E ognuno di noi, nell’impresa diuturna dell’analisi, dello
studio, dell’insegnamento, nel lungo viaggio per i
labirinti dello spirito cui ci ha spinto la nostra vocazione,
può forse pensare con modestia e anche con gioia che
questa impresa di avere studiato un oggetto così
prezioso, così trascendente, così perfetto, valeva
le blanc souci de notre toile
(il bianco affanno della nostra vela)
Sergio Cigada
10
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Chapter 1
Opening Credits: Face-to-Face and Movie Conversation . . . . . . . . . 17
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2 Spoken Language Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.2.1 Determinants and Features
of Face-to-Face Conversation . . . . . . . . . . . . . . . . . . . 24
1.3 Movie Conversation Studies . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.3.1 Fictitiousness or Spontaneity in Movie Language? . . . . 34
1.4 Biber’s Multi-Dimensional Analysis . . . . . . . . . . . . . . . . . . . 40
Chapter 2
The Making of: Methodology and Data . . . . . . . . . . . . . . . . . . . . . 47
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.2 Methodological Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.3 The Longman Spoken American Corpus (LSAC) . . . . . . . . . 51
2.4 The American Movie Corpus (AMC) . . . . . . . . . . . . . . . . . . 52
2.4.1 Building Criteria and Norming . . . . . . . . . . . . . . . . . . 53
2.4.2 Transcription Criteria and Tagging . . . . . . . . . . . . . . . 57
Chapter 3
Shot 1: Multi-Dimensional Analysis
of Face-to-Face and Movie Conversation . . . . . . . . . . . . . . . . . . . . . 61
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.2 Informational vs. Involved Production (Dimension 1) . . . . . 66
3.3 Narrative vs. Non-Narrative Concerns (Dimension 2) . . . . . 76
11
3.4 Explicit vs. Situation-Dependent Reference (Dimension 3) . 81
3.5 Overt Expression of Persuasion (Dimension 4) . . . . . . . . . . . 85
3.6 Abstract vs. Non-Abstract Information (Dimension 5) . . . . . 87
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Chapter 4
Shot 2: Close-ups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.2 Multi-Dimensional Analysis of Movie Genre . . . . . . . . . . . . 95
4.2.1 Informational vs. Involved Production
(Dimension 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.2.2 Narrative vs. Non-Narrative Concerns
(Dimension 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.2.3 Explicit vs. Situation-Dependent Reference
(Dimension 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.2.4 Overt Expression of Persuasion
(Dimension 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.2.5 Abstract vs. Non-Abstract Information
(Dimension 5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.3 Phraseological Comparisons . . . . . . . . . . . . . . . . . . . . . . . . 102
4.3.1 Word Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.3.2 Lexical Bundles: Two- and Four-grams . . . . . . . . . . . 104
4.3.3 Multi-Word Sequences and Pattern Types . . . . . . . . . 111
4.3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Chapter 5
Closing Credits: Implications and Applications . . . . . . . . . . . . . . . 117
5.1 Authentic Movie Language . . . . . . . . . . . . . . . . . . . . . . . . . 117
5.1.1 A Reflection of Spoken Language . . . . . . . . . . . . . . . 119
5.1.2 A Source for Spoken Language Teaching . . . . . . . . . . 120
5.2 Concluding Remarks and Future Directions . . . . . . . . . . . . 121
12
Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Appendix 1. Linguistic Features Codes
(Multi-Dimensional Analysis) . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Appendix 2. Face-to-Face Conversation Means Procedure
Appendix 3. Movie Conversation Means Procedure
Appendix 4. Movie Conversation Feature Counts
Appendix 5. Multi-Dimensional Analysis
of Borderline Genre Movies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
13
14
Preface
Over the last several decades, English language movies (and television
shows) have probably had a greater influence on spreading English
world-wide than any other mechanism. Viewers around the world
regularly watch English-language movies, and many ELT professionals
advocate using such movies in the classroom to teach and model natu-
ral conversation. However, other professionals caution that movie lan-
guage is not natural conversation, and thus not an accurate model for
language learners.
Surprisingly, this issue has not been previously investigated empiri-
cally. But in this important book, Forchini does exactly that, using
large-scale corpus analysis to compare the linguistic characteristics of
movie dialogues with spontaneous face-to-face conversations. Forchini
applies a variety of research approaches, including detailed considera-
tion of individual lexical phrases and linguistic features, as well as
Multi-Dimensional Analysis, and also compares the characteristics of
movie comedies and dramas. The results will surprise some readers,
indicating that movie dialogues are in many ways very similar to spon-
taneous face-to-face conversations with respect to a wide range of
linguistic characteristics. In conclusion, Forchini discusses the impli-
cations of these findings for ELT professionals looking for ways to help
their students acquire natural conversational skills.
Douglas Biber
Regents’ Professor
Northern Arizona University
15
16
Chapter 1
Opening Credits: Face-to-Face
and Movie Conversation
1.1 Introduction
The present book explores the linguistic nature of American movie

conversation, pointing out its resemblances to face-to-face conversa-
tion. Over the last 30 years, these two types of speech have been
claimed to differ in terms of language spontaneity. The first has been
traditionally defined as artificially written-to-be-spoken (Nencioni
1976, Gregory and Carroll 1978, Taylor 1999, Rossi 2003, Pavesi
2005) and deemed unlikely to comprise the features that characterize
conversation (Sinclair 2004b), whereas the second has always been con-
sidered the quintessence of spoken language, as it is totally spontane-
ous (McCarthy 1998, Biber et al. 1999, Halliday 2005, Miller 2006).
Figure 1, regarding sub-categories of speech, well illustrates this tradi-
tional categorization, which classifies movie conversation as reciting and
as the speaking of what is written to be spoken as if not written, hence as
non-spontaneous speaking:
17
Figure 1. Spoken language sub-categories (Gregory and Carroll 1978: 47).
Figure 1. Spoken language sub-categories (Gregory and Carroll 1978:47)

Yet, the following extract represents a case in point:
Yet, the following extract represents a case in point:
Extract 1:
Extract 1:
Speaker1 Oh, my God. There she is. There’s Rosemary.
Speaker2
Speaker1Where? Oh, my God. There she is. There’s
Speaker1 Right Rosemary.
there.
Speaker2
Speaker2Right where?
Where?
Speaker1
Speaker1Straight ahead.
Right Across the field.
there.
Speaker2
Speaker2Is she behind the
Right where? rhino?
Speaker1
Speaker1She’s right there! Mauricio,
Straight ahead. Across I want
the you
field.to meet someone.
Speaker2This is Rosemary Shanahan.
Is she behind the rhino? Rosemary, Mauricio Wilson.
Speaker3
Speaker1Hi. Nice to meet you.
She’s right there! Mauricio, I want you
Speaker2 Holy cow. I mean,
to meet uh… hi.
someone. This is Rosemary
Speaker3 Is thatShanahan.
uh a MembersRosemary, Mauricio Wilson.
Only jacket?
Speaker3Yes. Yes,
Speaker2 Hi.itNice
is. to meet you.
Speaker2So, what
Speaker3 Holyarecow.
you,Ilike,
mean,the uh… hi.
last member?
Speaker3Oh, man.
Speaker1 Is that uh a Members
One-nothing Only jacket?
Rosemary, I told you she was good.
Speaker2ExcuseYes.
Speaker3 Yes,just
me for it is.
one second. Hello? Oh, hi, Mom.
Speaker3Yeah, So,
holdwhat are you
on. Will you,guys
like,excuse
the lastme for a second?
member?
Speaker1 Sure. You want me to get something from the snack bar?
Speaker1Yeah, Oh,
Speaker3 uh getman.
me One-nothing Rosemary,
a beer like nachos with allIthe stuff on it.
told you she was good.
18
Would it not be problematic to identify the language as either face-
to-face or movie conversation? Consider, for example, the use of deixis
and ellipsis in the exchange (S1) Oh, my God. There she is. There’s Rose-
mary. (S2) Where? (S1) Right there. (S2) Right where? (S1) Straight ahead.
Across the field; and the use of pragmatic/discourse markers (I mean and
like), of response tokens (yeah), of hesitators/inserts (oh and uh), of
grammaticalized verbs (wanna), of vague language (stuff ), and of mild
expletives (oh my God and holy cow). These would appear to be char-
acteristic of natural conversation, although the passage is actually taken
from the movie Shallow Hal (Bobby and Peter Farrelly 2001).
The following extract is from face-to-face conversation:
Extract 2:
Speaker1 Where are the plates?
Speaker2 Up on the shelf right here behind this door.
Right see right there on the, turn around. No, no, no.
Speaker1 <unclear>
Speaker2 Oh I thought you were getting the plates.
Speaker1 Yeah it’s behind here huh?
Speaker2 Yeah.
Speaker1 Okay.
Speaker2 Just run hot water. Let me get this.
Speaker1 Why are you doing that? It’s evening time now.
Speaker2 I know because I missed out.
Speaker1 You missed out.
Speaker2 I don’t know what I was thinking about.
Here, the use of deixis and ellipsis is significant: (S1) Where are the plates?
(S2) Up on the shelf right here behind this door. Right see right there on
the, turn around. No, no, no., as well as the use of response tokens and
hesitators/inserts such as yeah, huh, and oh. It would appear that the
speakers in Extract 2 use identical strategies to those of the speakers
in Extract 1 and, consequently, that face-to-face and movie conversa-
tion share recognizable spoken features. To what extent, then, are these
features present in the two conversational domains? Do they serve any
functions, and if so, are these functions similar or different? Further-
more, if these functions are similar, from a theoretical point of view,
19
what weight should the artificial status accorded to movie language
carry?
These questions are the grounds for the comparison between face-
to-face and movie conversation which the present investigation offers.
The reason for carrying out such a study lies firstly in the fact that
not many scholars have written about movie dialogues in general, and
very few have compared movie language to face-to-face conversation.
Some related studies have focused on issues of dubbing or sub-titling
movies into other languages, comparing the original and dubbed or
subtitled versions (Bollettieri Bosinelli 1998; Gottlieb and Gambier
2001; Pavesi 2005). Secondly, a considerable amount of work has been
carried out on movie scripts (Taylor 1999, Taylor and Baldry 2004),
but not on transcribed movie dialogues. Thirdly, some strongly-worded
claims about the non-spontaneity of movie language have been based
on intuition, rather than on empirical evidence: Sinclair (2004b: 80),
without providing data, maintains that movie language is “not likely
to be representative of the general usage of conversation” in that its
distinctive features do not “truly reflect natural conversation”. Lastly,
there are no studies of movie language that apply Biber’s (1988) Multi-
Dimensional Analysis approach, which has proven to be reliable as an
empirical method of describing the linguistic characteristics of texts.
Rather than trusting intuitive judgments or examining movie (web)
scripts, then, in this work I present an empirical investigation of the
linguistic similarities or differences between face-to-face and movie
conversation via Multi-Dimensional and corpus-driven analyses. I ad-
dress the following research questions: at a macro-level, to what ex-
tent do face-to-face and movie conversation differ or resemble each
other? At a micro-level, to what extent does movie genre influence this
similarity or difference? And to what extent do lexical bundles, multi-
word sequences and pattern types resemble each other or differ in the
two conversational types investigated? As a means to an answer, I com-
pare the data from an existing spoken American English corpus (the
Longman Spoken American Corpus) to a corpus of American movie con-
versation (the American Movie Corpus), which I purposely built (see
Chapter 2).
20
Conceptually, the work is divided into two main parts: Chapters 1
and 2 introduce the theoretical background, and Chapters 3 and 4
offer the practical analyses. Chapter 5, the concluding chapter, sum-
marizes the findings of the research and highlights its main implica-
tions and applications. In detail, Chapter 1, Opening Credits, illustrates
the reasons why spoken language is a relatively new field of research,
presenting the main contributions to studies of spoken language, and
providing an overview of the determinants and features that charac-
terize spontaneous conversation. It then explores movie conversation,
a type of speech whose status needs clarification on the basis of em-
pirical evidence. Thirdly, it illustrates in detail Biber’s Multi-Dimen-
sional Analysis, which is the statistical and computational methodo-
logy used to compare face-to-face and movie conversation. Chapter
2, The Making of 1, focuses on the main steps of this methodology
and describes the corpora used to retrieve the empirical data.
The data analysis, which was made possible especially thanks to the
collaboration and support of Douglas Biber at Northern Arizona Uni-
versity, is divided into two chapters, Shots 1 and 2. Chapter 3 (Shot 1)
zooms out for a long shot of face-to-face and movie conversation and
offers a quantitative and qualitative macro-investigation by applying
Biber’s Multi-Dimensional Analysis approach. Chapter 4 (Shot 2), in-
stead, zooms in for close-ups of movie genre as a variable that could
influence the analysis, and of lexical bundles, multi-word sequences
and pattern types in the two domains investigated by corpus-driven
analyses.
The crux of the book’s argument, reiterated in Chapter 5, Closing
Credits, is that movie conversation does not, in fact, differ significantly
from face-to-face conversation, and can therefore be legitimately used
to study spoken language. The extensive Multi-Dimensional Analyses
of the linguistic features in the two domains are presented in the Ap-
pendices.
1 In motion pictures, the making of (or making-of ) is also known as behind

the scenes (or behind-the-scenes).
21
1.2 Spoken Language Studies
As McCarthy (1998) points out, spoken language is the primary oc-

currence of language in human society. Miller and Weinert (1998: 4)
go so far as to say that spoken language has priority over written lan-
guage, and that this is “a central tenet of twentieth-century linguis-
tics”. Other scholars concur that it is “the most commonplace, every-
day variety of language” (Biber et al. 1999: 1038), and that the spoken
language has pride of place (Halliday 1987). It might also be the case
that neither speech nor writing is primary; however, they are differ-
ent systems that deserve careful exploration (Biber 1988), and it is only
since the 1980s that conversation has been researched specifically
(Halliday 1987).
Scholars have made various hypotheses as to why it took centuries
for speech to become the focus of language research. McCarthy (1998:
16) recalls that Samuel Johnson did consider the link between speech,
language use and language description in his famous dictionary, and
that manuals on rhetoric and eloquence in speech were printed from
the 16th century on. However, as Biber et al. (1999) point out, the
Western grammatical tradition has been founded almost exclusively
on the written language. One influence might be found in the Greek
origin of the word grammar itself – a letter, i. e. a piece of writing, or
written mark – but, in a sense, there has been no conscious choice
throughout most of the history of linguistics. In Halliday’s words
(2005: 159), “to study text, as data, meant studying written text; and
written text had to serve as the window, not just into written language
but into language”.
Other reasons could be linked to methods of data collection: re-
search into spoken language was not feasible before the invention of
the tape recorder, since linguists had no means of capturing spoken
language (Halliday 2005). Furthermore, although the emergence of
sizeable computer corpora has made a major difference to the amount
of available data, spoken language is more difficult to observe and to
codify. Even if spoken language is transcribed, “one is always dealing
22
with an imperfect product, especially compared to the accuracy with
which the latest optical text scanners can quickly gobble up vast
amounts of written text and deposit them in machine-readable form”
(McCarthy 1999: 13; see also Biber et al. 1999: 1041, Halliday 2005:
162).
The main contributions to studies of spoken language in the twen-
tieth century have come from scholars working in the field of Prag-
matics. They studied meaning in interaction, that is, those aspects of
language that cannot be considered in isolation from use, thus pro-
viding insights into speech acts (Austin 1962, Searle 1969), implicature
(Grice 1975), and politeness (Brown and Levinson 1987), inter alia.
Others working within the Conversation Analysis paradigm, and deal-
ing more specifically with talk-in-interaction (Thomas 1995), have
discovered and described patterns such as turn-taking (Sacks 1992;
Ford, Fox, and Thompson 2002) and adjacency pairs (Sacks 1992). A
new slant on spoken language was undoubtedly introduced by Cor-
pus Linguistics, which has facilitated the creation of research meth-
ods by attempting to trace a path from data to theory. For example,
the idea that speech has low informational content derives from the
statistical evaluation of word frequencies, which has shown that it has
a high frequency of pronouns and verbs and a low frequency of nouns
(cf. Biber et al. 1999, McCarthy 1999, Biber and Finegan 2001a,
Carter and McCarthy 2006). Another key area of corpus-based research
is register variation: it has been shown, for instance, that noun phrases
are more complex in written than in spoken texts, and that many of
the words of spoken language “clearly belong to the traditional prov-
ince of grammar / function words, in that they are devoid of lexical
content” (McCarthy 1999: 5; see also Halliday 2005).
Thanks to such studies, it has been possible to identify a spectrum
of linguistic features which depend on what Biber et al. (1999: 1041)
label “determinants of conversation”. These determinants, which char-
acterize spontaneous conversation, will be described in the next sec-
tion. The term face-to-face conversation is used here to refer to sponta-
neous conversation which, as Miller and Weinert (1998: 22) point out,
“is typically produced by people talking face-to-face”.
23
1.2.1 Determinants and Features of Face-to-Face Conversation
The following determinants and features are claimed to characterize

face-to-face conversation:
a. Face-to-face conversation takes place in the spoken medium. It comes
about through the oral-auditory channel (Biber et al. 1999) and
consequently involves supra-segmental features like pitch, rhythm,
and voice quality (Miller 2006). At the same time, face-to-face con-
versation also occurs with non-verbal paralinguistic features such as
gestures, body postures, facial expressions, and eye-gaze/-contact which
also carry information (Erman 1987; Bercelli 1999; Contento 1999);
b. Face-to-face conversation takes place in real time; it is an impromptu
event which gives speakers no opportunity to edit what they say
(Chafe 1982; McCarthy 1999; Biber et al. 1999; Quaglio and Biber
2006). Speech is by nature evanescent: as Cameron (2001: 31) puts
it, “it consists of sound waves in the air, and sound begins to fade
away as soon as it is produced”. That is to say, if somebody utters
the words hello Rory, for example, by the time (s)he gets to Rory, it
is no longer possible to hear hello (Cameron 2001). The hearer
must, therefore, process the utterance as it happens, in real time.
Consequently, face-to-face conversation is subject to the limitations
of short-term memory in both speaker and hearer (Miller and
Weinert 1998; Miller 2006);
c. Face-to-face conversation typically takes place in a shared context, in
that the conversation participants usually share specific social, cul-
tural, and institutional knowledge. This entails that there is no need
for meaning to be made explicit or elaborated on, and consequently
“conversation can do without the lexical and syntactic elaboration
that is found in written expository registers” – noun phrases with
modifiers and complements are used more rarely than in written
language (Biber et al. 1999: 430). Lexical density is also typically
low (cf. Halliday 1985a, Biber et al. 1999);
d. Face-to-face conversation is interactive, continuous, and expressive of
politeness, emotion, and attitude. This is because face-to-face con-
24
versation is constructed by at least two interlocutors, i. e. a speaker
and a hearer, who dynamically shape their speech within the on-
going exchange. This oscillating movement between the interlocu-
tors is evident in utterance-response sequences, or adjacency pairs,
which “may be either symmetric, as in the case of one greeting echo-
ing another, or asymmetric, such as a sequence of question followed
by answer” (Biber et al. 1999: 1045). Another source of conver-
sational dynamism is the routine use of discourse markers and
similar devices such as interjections, gap fillers, hedges, tags, back-
channels, connectors, and vocatives, which signal the interactive na-
ture of the speaker’s utterance (cf. Erman 1987, Bazzanella 1990,
Gavioli 1999, Aijmer and Stenström 2005, Redeker 2006). The use
of politeness in exchanges such as greetings, requests, offers, and
apologies (Brown and Levinson 1987, Biber et al. 1999, Stame
1999) is another contributing factor to determinant (d).
The presence of supra-segmental and paralinguistic features (determi-
nant a) and shared knowledge in conversation (determinant c) is re-
flected in the conversants’ use of reference and implicit non-elaborated
meaning. These features convey information which may not be obvi-
ous to an outsider, and allow for a reduction of the number of words
uttered, and for simplified grammatical structures. An example of this
is the use of non-clausal or grammatically fragmentary components
such as “stand-alone words” (e. g. interjections) which “rely heavily for
their interpretation on situational factors” (Biber et al. 1999: 1042).
The high frequency of words with a referring function, such as pro-
nouns used in place of nouns, is another example of the way shared
knowledge influences the shape of the language of conversation (Biber
1988).
The on-the-fly trait of face-to-face conversation, determinant (b),
typically gives way to what has been called normal dysfluency (Biber et
al. 1999: 1048) and fragmented language (Chafe 1982: 39). As
McCarthy (1998) and Biber et al. (1999) point out, a speaker’s flow
is naturally impaired by pauses, repetitions (I – I – I ), and hesitators
(er, um), especially when the speaker needs to keep the conversation
25
going and his/her mental planning needs to catch up. Undoubtedly,
even though people do not have time to shape a flow of ideas into
complex and well-integrated utterances, some degree of planning may
be involved and the rate of communication can vary. This planning
clearly comes about before speech happens, in contexts where the
speaker knows what to say, or when the speaker and hearer share
knowledge (Biber et al. 1999).
Finally, the determinants of real time (b) and interactivity (d) lead
speakers to repeat the same repertoire of expressions, falling back on
prefabricated sequences of words. Speakers find it difficult to make
full use of their grammar and lexicon when there is time pressure.
Repetition of prefabricated word sequences, which are readily accessi-
ble from memory (cf. Chafe 1982, Tannen 1982, McCarthy 1998,
Bazzanella 1999, Biber et al. 1999, Cameron 2001, Halliday 2005),
can help them buy time to plan the next chunk of speech. Further-
more, unlike in written language, the fast rate of conversation con-
strains speakers to reduce the length of what they have to say to save
time and energy: in the words of Biber et al. (1999: 1048), “speed of
repartee, making an opportune remark, getting ‘a word in edgeways’
in a lively dialogue, or reaching the point quickly, may all add urgency
to the spoken word”.
Further evidence that speakers prefer repetitive structures is offered
by McCarthy (1999), who suggests the existence of a core vocabulary
of spoken English. The basis of his claim is the fact that “in compu-
ter-based frequency counts, there is usually a point where frequency
drops off rather sharply, from hard-working words which are of ex-
tremely high frequency to words that occur relatively infrequently”
(McCarthy 1999: 2). He identifies nine broad categories of a basic
spoken vocabulary:
– discourse markers (I mean, right, well, so, good, you know, anyway);
– modal items (modal verbs, lexical modals, adverbs, and adjectives);
– delexical verbs (do, make, take, and get);
– interactive words ( just, whatever, thing(s), a bit, slightly, actually, ba-
sically, really, pretty, quite, literally);
26
– basic nouns ( person, problem, life, noise, situation, sort, trouble, family,
kids, room, car, school, door, water, house, TV, ticket);
– general deictics (this, that, here, there, now, then, ago, away, front,
side, …);
– basic adjectives (lovely, nice, different, good, bad, horrible, terrible,
different);
– basic adverbs (especially those referring to time, such as today, yes-
terday, tomorrow, eventually, finally; frequency and habituality, such
as usually, normally, generally; and manner and degree such as quickly,
suddenly, fast, totally, especially);
– basic verbs for actions and events (sit, give, say, leave, stop, help, feel,
put, listen, explain, love, eat, enjoy).
To keep the conversation going, speakers also use redundant informa-
tion. Consequently, spontaneous speech typically contains a large
number of connectors, gap fillers, hedges, tags, backchannels, inter-
jections and discourse markers (Erman 1987, Cameron 2001).
All the features illustrated so far have been further proven in stu-
dies which apply multivariate statistical techniques, such as factor and
cluster analyses2 (Biber 1985, Biber and Finegan 1986). These studies
show that face-to-face conversation is interpersonal, situation-depend-
ent, and has no narrative concern3, or as Biber and Finegan (1986)
put it, is a highly interactive, situated and immediate text type. It is
typically interactive because it displays “frequent occurrence of features
like first and second person pronouns, questions, hedges, contractions,
that-clauses, if-clauses” (Biber and Finegan 1986: 40). At the same time,
2 Factor analysis “empirically identifies the groups of co-occurring linguistic fea-

tures and provides the basis for the interpretation of the underlying textual
dimensions in a given domain” (Biber and Finegan 1986: 23), whereas cluster
analysis “empirically identifies the groups of texts that are maximally similar in
their exploitation of the textual dimensions, providing the basis for interpreta-
tion of these groupings as text types” (Biber and Finegan 1986: 23). For fur-
ther details see Biber (1988).
3 Face-to-face conversation also contains non-abstract information and is not
particularly persuasive (see Section 1.4 and Biber 1988 for details).
27
it is situated, because place and time adverbs are frequently used, and
immediate because present tenses are more common than past tenses.
The above-mentioned traits of spontaneous face-to-face conversa-
tion make it typically informal. Consequently, since informality is of-
ten associated with laxity in terms of form, it is thought that face-to-
face conversation is not influenced by the traditions of prestige and
correctness associated with written texts, “where the English language
is on its best behaviour” (Biber et al. 1999: 1050). Two points can be
made in this regard. First, the linguistic features of spontaneous con-
versation, which are intrinsically pragmatic, should not be treated as
performance errors. These features simply reflect the conditions un-
der which it is produced: “the structures of spontaneous spoken lan-
guage have developed in such a way that they can [italics in text] be
used in the circumstances in which conversation […] usually takes
place” (Miller and Weinert 1998: 23). That is to say, in Schiffrin’s
(1987: 5) words, “language is potentially sensitive to all of the con-
texts in which it occurs […] and reflects [bold in text] these contexts
because it helps to constitute them”. Given the impromptu nature of
face-to-face conversation, devices like gap fillers are typically used to
keep the conversation going, and since this type of conversation usu-
ally takes place between people who are on familiar terms, a range of
informal expressions, such as non-standard verb forms like ain’t in Brit-
ish English, or y’all as a second person pronoun in Southern varieties
of American English may be employed (Biber et al. 1999: 1050).
Secondly, as Halliday (1985b) points out, it is only when spoken
language is perceived in terms of its transcription that it appears to
lack a clearly defined shape or form. If the transcription of a written
text included the planning processes behind it (e. g. brainstorming,
drafting, editing), the final text would appear amorphous as well.
28
1.3 Movie Conversation Studies
There are a number of works on aspects of movie language. The vast

majority has concentrated on dubbing and subtitling (cf. Menarini
1955, Baccolini and Bollettieri Bosinelli 1994; Pavesi 1994, 2005;
Bollettieri Bosinelli 1998; Pavesi and Malinverno 2000; Taylor 2000a,
2000b, 2000c, 2003; Gottlieb and Gambier 2001; Bruti and Perego
2005; Bruti 2006). Some other studies have concentrated on the tech-
nical terms in movie making (May 1962), on the communication be-
tween the text and the audience (Goffman 1976, 1979; Herbert and
Schaefer 1992; Bettettini 2004; Bubel 2008), or on the language of
movie scripts (cf. Taylor 1999, Taylor and Baldry 2004). What has not
been investigated to any great extent so far is the language actually spo-
ken in movies, as opposed to their dubbed versions.
Studies by Rossi (2003) and Pavesi (2005) stand out as exceptions,
since they identify some of the linguistic features of movie language.
However, both works are qualitive, and do not attempt to offer a com-
prehensive description. As a matter of fact, considered as a whole, stud-
ies on movie language offer quite contrasting observations. As stated
previously, movie conversation is traditionally described as a kind of
prefabricated speech which is written to sound like authentic speech
(cf. Nencioni 1976, Gregory and Carroll 1978, Taylor 1999, Pavesi
2005). Actors follow a written script, but their dialogues have to ap-
pear spontaneous within the artificial setting of the motion picture.
Taylor (1999) and Pavesi (2005), however, also describe this type of
conversational domain as displaying some of the features typical of
spoken language. These conflicting observations indicate that movie
language is a rather complicated domain to define, and it is likely that
this is due to the lack of empirical studies based on actual movie
transcriptions. This lack of empirical evidence can be ascribed to a
number of possible reasons. First of all, it is clearly difficult to de-codify
and analyze the language of movies. As Chaume (2004a: 14) and Pavesi
(2005: 9) point out, movies offer two channels of communication:
the auditory channel, involving the acoustic side of the movie, such
29
as the language(s) spoken, sounds, noises, etc., and the visual chan-
nel, which includes all the physical images within the movie, such
as road signs, shop fronts, clothes, colors, body movements, face
expressions, and so on. The possible interplays and combinations
of these two channels are fundamental to the delivery of their “aural-
verbal, aural non-verbal, visual-verbal and visual non-verbal messages”
(Remael 2001: 14), and to the production of their meaning (Cattrysse
2001).
According to Chaume (2004c: 16–21), the following codes can be
distinguished:
1. the linguistic code, which is the language used;
2. the paralinguistic code, which denotes features that provide non-
verbal information, but which are still auditory (e. g. laughter);
3/4. the musical and the special effects code, which are represented
by songs and illusions created by props, camerawork, computer
graphics, etc.;
5. the sound arrangement code, which deals with features which ei-
ther belong to the story (i. e. diegetic sound) or to a person or
object that is not part of the story, such as an off-screen narrator
(i. e. non-diegetic sound). The sound arrangement code implies
both the sounds that are produced on-screen (those associated
with the vision of the sound source) or off-screen (those whose
origin is not present in the frame and therefore not visible simul-
taneously with the perception of the sound);
6. the iconographic code, which represents the icons, indices, and
symbols in the movie;
7. the photographic code, which deals with changes in lighting, in
perspective, or in the use of color (e. g. color vs. black and white
or intentional use of some colors);
8. the planning code, which depends on the types of shots (i. e. close-
ups and extreme close-ups);
9. the mobility code, which includes proxemic (i. e. related to space)
and kinetic (i. e. related to motion) signs, and the screen charac-
ters’ mouth articulation;
30
10. the graphic code, which is the written language present on screen
(i. e. titles, intertitles, texts, and subtitles);
11. the syntactic code, which is the editing, namely, the process of shot
associations.
The linguistic, the paralinguistic, the musical, the special effects, and
the sound arrangement codes are transmitted by the auditory/acous-
tic channel, whereas the iconographic, photographic, the planning, the
mobility, the graphic, and the syntactic/editing codes are transmitted
by the visual channel (Chaume 2004c: 16).
The lack of empirical evidence can also be ascribed to the fact that it
is not easy to find transcriptions of movie dialogues, and the absence of
freely available movie corpora is a hindrance to such research. Further-
more, scripts which can be downloaded from the internet differ consider-
ably from what is actually said in movies. It is not surprising, then, that
studies based on scripts make claims about the non-spontaneity of movie
language. Although such scripts are a genre in their own right, they are
in fact inappropriate for investigations on real movie conversation.
As a start, one common difference is the length of the transcripts.
To take an example from the AMC, the total number of words tran-
scribed for the movie Shallow Hal is 11,490, whereas the script re-
trieved from the web4 contains 10,660 words. Another important dif-
ference is the number of occurrences of features which are typical of
spoken language: there are 49 occurrences of you know and 31 of I
mean in the transcription of the movie Shallow Hal, whereas they oc-
cur respectively 38 and 23 times in the web script.
The following extracts, which have been taken from the movie cor-
pus used in this study (the AMC, cf. Section 2.4) and from the internet,
show the extent of this difference5. Extract 3 demonstrates that the same
4 Cf. <http://www.script-o-rama.com/movie_scripts/s/shallow-hal-script-tran-
script-paltrow.html>.
5 It is worth noting that the scriptwriter him/herself sometimes puts a note un-
derlying that the web script has not been written to represent the actual words
in the movie: “This transcript is not trying to get the movie word for word, but
close to it. This transcript is for reading purposes only!”; source: <http://www.
awesomefilm.com/script/MI2.html>.
31
scene has the same content in the two transcriptions, but different word-
ing: both texts start with an introduction and a joke, but in the AMC
Extract 3. A scene
transcription from The Devil
it is Andrea, and Wears Prada: same
not Emily, whocontent, different
talks first, andwording
the sen-
tence about the joke is more explicit (cf. “Great. Human Resources
certainly has an odd sense of humor. Follow me.” vs. “They do have an odd
sense of3.humor”);
Extract Andrea
A scene from alsoWears
The Devil has aPrada
different
: samelast name
content, in thewording
different two texts
(Sachs vs. Barnes). Moreover, the web script (which is deliberately pre-
sented here in its online script format) also contains a description of the
setting, which is not present in the transcription, given that this infor-
mation is not relevant to the analysis of the language used in the movie.
Extract 3. A scene from The Devil Wears Prada: same content, different wording.
6
AMC transcription Web (tran)script
Hi. Uh, I have an
Andrea appointment with Emily
Charlton?
Emily Andrea Sachs?
Andrea Yes.
Great. Human
Resources certainly has
Emily
an odd sense of humor.
Follow me.
Okay, so I was
Miranda’s second
assistant… but her first
Emily
assistant recently got
promoted, and so now
I’m the first.
Oh, and you’re
Andrea
replacing yourself.
Well, I am trying.
Miranda sacked the last
two girls after only a few
Emily weeks. We need to find
someone who can
survive here. Do you
understand?
Yeah. Of course. Who’s
Andrea
Miranda?
6 Extract from the screenplay by Peter Hedges. Source: <www.dailyscript.com/

scripts/devil_wears_prada.pdf>.
6
Extract from the screenplay by Peter Hedges. Source:
www.dailyscript.com/scripts/devil_wears_prada.pdf
32
6
Extract from the screenplay by Peter Hedges. Source:
Oh, my God. I will
pretend you did not just
ask me that. She’s the
editor in chief of
Runway, not to mention
Emily a legend. You work a
year for her, and you
can get a job at any
magazine you want. A
million girls would kill
for this job.
It sounds like a great
Andrea opportunity. I’d love to
be considered.
Andrea, Runway is a
fashion magazine so an
Emily
interest in fashion is
crucial.
What makes you think

Andrea I’m not interested in
fashion?
ExtractExtract 4, instead,
4, instead, whichwhich focuses
focuses on conversationonly,
on conversation only, and
and illustrates
illustrates
the Extract
the first 4, instead,
words
first words utteredinwhich
uttered in
thethefocusesandonwritten
movie
movie conversation
and written inonly, and
theshows
in the script, illustrates
script, shows
that the
thethat
firstthe
two textstwo
words texts
uttered
have have
in theadifferent
a totally totallyand
movie different
written
content content
in they
(i.e. (i. e.inthey
the script,
start shows start
that in
a completely thea
twocompletely
different
texts have different
way), aand that way),
totally andcontent
even scripts
different that
whicheven scripts
have they
(i.e. been startwhich
strippedin aofhave been
redundant
completely
stripped
information
different
of redundant
way), (like
and the
information
thatsetting) and which
even scripts
(like
onlyhave
which
the
contain setting)
been words and
do not
stripped
which
of correspond
redundant
only
contain words do not correspond to what is actually said in the movies.
to what is actually said in the movies.
information (like the setting) and which only contain words do not correspond
to Extract
what is4.
actually said
The very in words
first uttered in Catwoman: totally different content.
the movies.
Extract 4. The very first words uttered in Catwoman: totally different content
7
AMC transcription Web (tran)script
It all started on the day that I Edna Yes?

died. If there had been an
obituary, it would have
described the unremarkable life Patience Hi, Edna Powers?
Patience of an unremarkable woman, I’m Patience Price, I called about
survived by no-one. But there adopting a cat?
was no obituary, because the I saw your flyer at my vet's office
day that I died was also the day
I started to live. But that comes Edna Oh yes, do come inside.
later….
7 Extract from the

It is quite screenplay
obvious by Danfound
that scripts Waters.
onSource: <http://www.dailyscript.
the internet cannot be used to
com/scripts/catwoman.pdf>.
represent real movie language, and that movies need to be transcribed if their
language is to be studied.
33
Another possible reason for the lack of studies on movie language may
be prejudice against this kind of conversational domain as one which does not
provide evidence for spontaneous conversation, due to its planned and
It is quite obvious that scripts found on the internet cannot be used
to represent real movie language, and that movies need to be tran-
scribed if their language is to be studied.
Another possible reason for the lack of studies on movie language
may be prejudice against this kind of conversational domain as one
which does not provide evidence for spontaneous conversation, due
to its planned and therefore non-spontaneous nature (cf. Rossi 2003:
93). The following quote from Sinclair (2004b) clearly illustrates this
point (italic mine):
If it is impossible in an early stage of a project to collect the spoken language,

then there is a temptation to collect film scripts, drama texts, etc., as if they
would in some way make up for this deficiency. They have a very limited value
in a general corpus, because they are ‘considered’ language, written to simu-
late speech in artificial settings. Each has its own distinctive features, but none
truly reflects natural conversation, which for many people is the quintessence
of the spoken language. […] such records are not likely to be representative of
the general usage of conversation (Sinclair 2004b: 80).
Sinclair’s (2004b: 80) point of view is fundamental for the present work
for two reasons: first, because it draws attention to the fact that movie
language has its own distinctive features and thus strengthens the case
for searching for them; second, because it openly declares that movie
language has “a very limited value” since it does not reflect natural
conversation and, consequently, is “not likely to be representative of
the general usage of conversation”. The crucial missing element in the
comment is, however, empirical evidence, which is the objective of
the present work.
1.3.1 Fictitiousness or Spontaneity in Movie Language?
As mentioned previously, while scholars such as Nencioni (1976,

1983), Gregory and Carroll (1978), Taylor (1999), Rossi (2003) and
Pavesi (2005) highlight the artificiality of movie language, no clear pic-
ture of its nature emerges from the literature, because of the lack of
34
empirical studies. Movie language is described as artificial, yet, at the
same time, it is also considered a domain displaying elements of spon-
taneity (Taylor 1999, Pavesi 2005).
There is no doubt that in terms of literal spontaneity, movie lan-
guage is fictitious: it pretends to be authentic, but it is planned and
artificial. It is usually defined as non-spontaneous for three reasons:
first, because it is prefabricated; second, because it is written, or rather,
it is written to be spoken as if it were not written; third, because it
always implies some reciting, i. e. the speakers are actors who recite
and have to follow a script, or screenplay, even when they are asked
to improvize (Nencioni 1976, 1983; Gregory and Carroll 1978; Taylor
1999; Rossi 2003; Pavesi 2005).
There are also other factors which, it is claimed, create non-spon-
taneity: Taylor (1999) draws attention to the time and space constraints
of movies. Movie length is a constraint that leads to fictitiousness, in
that it obliges the scenes and language to be explicit and compact: there
is not much time and space for redundancy in a two-hour movie; con-
sequently, dialogic exchanges and scenes must be relevant and precise.
Even scenes that might be of interest are often edited out because of
the lack of space and time, although they are sometimes inserted in
extra sections of the final versions of DVDs.
Taylor (1999) also points out the need for movies to be appealing
and to be commercially successful, which naturally influences (and jus-
tifies) the linguistic choices made when building up the dialogue. Such
appeal is closely connected to two other important constraints, the
need to relate enthralling stories, and to prevent the audience from
losing track of the plot, which compromises the spontaneity of lan-
guage. These constraints make the spontaneity of language secondary
because, in the interest of box office sales, the story line needs to be
involving and clear. Strategies to achieve involvement and clarity can
be seen in the “excess of highly pertinent, dramatic or intriguing ex-
changes” of dialogues which are not “garbled”, but rather “clearly sepa-
rated” (Taylor 1999: 265–266). As an inevitable consequence, movie
conversation loses some spontaneity and acquires artificiality: even
when the audience is introduced to a scene that starts mid-conversa-
35
tion, which is supposed to recall spontaneous speech, the information
exchange is “often made artificially clear” (Taylor 1999: 267; cf. also
Pavesi 2005: 34). The same happens when an on-going conversation
is stopped by the introduction of a new scene and then re-presented:
the dialogue continues from the point it was at before the interrup-
tion, regardless of the time that has passed. Similarly, when an initial
topic of conversation gives way to a series of subtopics, so as to re-
semble spontaneous speech, movie dialogue still tends to “stick to the
point” (Taylor 1999: 267), whereas in spontaneous conversation to-
tally different subjects can easily emerge and then be abandoned.
Another artificial strategy used to help the audience keep track of
the movie is the introduction of a carefully planned rhythm of the
dialogue, which is slower and clearer than in naturally occurring con-
versation (Pavesi 2005). This is closely bound to another difference
from spontaneous conversation, mentioned by Quaglio and Biber
(2006) in connection with the TV series Friends. The language in
Friends “has almost no overlaps, to avoid possibility of misunderstand-
ings by the audience” and, “at the discourse level,” it has “far fewer
repetitions and interruptions” than those usually found in natural con-
versation (Quaglio and Biber 2006: 716–717). In a similar way, it is
claimed that features which are usually abundant in real conversation,
like discourse markers and vocatives, do not occur very frequently in
movies (cf. Chaume 2004b: 850; Pavesi 2005: 32).
Alongside the artificial strategies mentioned so far, some features
are mentioned in the literature which are, in fact, traces of spontane-
ity. The claim made here is that these traces of spontaneity depend
on the typical determinants of natural conversation. Thus, it is pos-
ited that these determinants can analogously be called determinants of
movie conversation.
Consequently:
(a) movie conversation takes place in the spoken medium and oc-
curs with non-verbal paralinguistic features;
(b) movie conversation not only pretends to take place in real time,
but actually does take place in real time, if real time is perceived
36
as an ongoing process. Although movies are pre-recorded and not
impromptu events, the audience perceives that something is hap-
pening while watching the movie: “the visual medium with mov-
ing images and the potential of exploiting the written and spo-
ken codes at the same time enhances the sense of immediacy”
(Mansfield 2006: 34; cf. also Pavesi 2005: 30);
(c) movie conversation usually takes place in a shared context;
(d) movie conversation is interactive, continuous, and expressive of
politeness, emotion, and attitude.
In detail, the presence of deictics and elisions, reported by Taylor
(1999) and Rossi (2003), are due to determinants (a) and (c): the
fact that movie conversation takes place in the spoken medium (de-
terminant a) and occurs with non-verbal paralinguistic features (de-
terminant c) implies that speakers and listeners can rely on implicit
meaning or reference, and thus avoid elaboration or specification of
meaning.
The use of incomplete utterances, self-corrections / repairs, reformu-
lations, repetitions, insert breaks / pauses (Rossi 2003), and/or over-
lapping conversation (Taylor 1999) are all effects of determinant (b),
which is typically realized as normal dysfluency and fragmented lan-
guage of any conversation which takes place in real time.
Finally, the use of inserts, hesitators, vocatives, hedges, adjacency
pairs (question / answer), short and phatic devices, expletives, fillers,
tag questions, and discourse markers (Taylor 1999; Rossi 2003;
Quaglio and Biber 2006, Forchini 2010) can all be seen as realiza-
tions of determinants (b) and (d): the on the fly and interactive char-
acter of conversation leads speakers to use the same repertoire of ex-
pressions to keep the conversation going.
The vocabulary of movie language offers further evidence of speak-
ers using repetitive structures when talking, which can be ascribed to
determinants (b) and (d). Indeed, it seems to favor a core vocabulary,
which usually avoids literary and dialectal terms, jargon and techni-
cisms (Pavesi 1994, 1996; Taylor 1999). This is a very interesting trait
for, although Pavesi (2005) points out the presence of artificial ele-
37
ments in movies8, the use of some of these features, like the simplifi-
cation of syntactic structures (cf. Biber et al. 1999) and the basic core
just mentioned, recalls the basic, or core, vocabulary typical of spon-
taneous spoken conversation (cf. McCarthy 1999: 2 and Section 1.2).
Another feature of movie language is that it tends to be informal,
due to the typical linguistic traits of spontaneous conversation illus-
trated so far. This feature is linked to the interactive determinant (d),
which highlights the interpersonal meta-function at work in movie
language, establishing and maintaining social relations (Halliday
1985a). An example of this informality is the use of syntactic-prag-
matic strategies like contractions (Quaglio and Biber 2006), fronting
(Taylor 1999), dislocations (Taylor 1999, Pavesi 2005), clefts (Pavesi
2005), and of dialogic two-grams9 such as are you, do you, all right,
come on, thank you (Forchini 2010).
Interestingly, these recurring interactive features in movie conver-
sation occur in various environments, similarly to spontaneous con-
versation. Pavesi (2005: 30) notes that dialogic exchanges between col-
leagues, friends, or neighbors may take place in any location, in
restaurants, at the mall, at the hairdressers’, etc. Equally, movie dia-
logue occurs in any kind of interaction, in symmetric (between equals)
or a-symmetric relationships (superior-inferior, doctor-patient, teacher-
learner), and in more than one language: movie dialogue can display
8 Cf. the leveling out of sociolinguistic variation (i. e. dialectal traits, local and
colloquial tones are often deleted and/or simplified); of syntactic structures
(i. e. monoclausal utterances are usually preferred and subordination tends to
be distributed homogeneously, cf. Pavesi 2005: 32 and Rossi 2003: 103); of
lexical choices (i. e. usually movies offer the same core vocabulary, avoiding
literary and dialectal terms, jargon and technicisms, cf. Pavesi 2005: 33); of
turn taking and utterances (i. e. the latter tend to employ the same number of
words, cf. Pavesi 2005: 32); and of dialogues (like, for example, the reduced
and predictable use of phatic devices, interjections and discourse markers, Pavesi
2005: 34), which are often stereotyped.
9 N-grams (Fletcher <http://phrasesinenglish.org/>) are also referred to as lexical
bundles (Biber et al. 1999), sequences of words (Hunston 2006), clusters (Scott
and Tribble 2006), phrasal units (Stubbs 2007), etc.
38
plurilinguism, code-switching (the movement from one language to
another), and code-mixing (hybridization) (Rossi 2003: 113).
This introductory section has illustrated that the literature describes
movie language as both fictitious and spontaneous. Movie speech can
be labeled “quasi-speech”, as Sinclair (2004b: 80) calls it, if the term
“speech” is identified with spontaneous spoken language; movie lan-
guage is indeed a spoken variety, but it is first written and then re-
cited. It cannot therefore be said to be 100% spoken: spontaneity is
just an illusion. Furthermore, some of the distinctive characteristics
of movie dialogue that contribute significantly to its non-spontaneity
are necessarily “imposed by the televized medium” (Quaglio and Biber
2006: 716) and are needed to fulfill a number of functions, such as
contributing to the unfolding and comprehension of the narrative.
Clear and concise dialogues, together with explicit and linear breaks
and blocks of information, help the audience understand what is go-
ing on (Taylor 1999, Pavesi 2005). It is curious to note that audiences
easily accept the non-spontaneous anomalies of movies (Pavesi 2005:
30), proving that movie non-spontaneity is not limiting. Besides, the
fact that movie dialogues imitate real dialogue without necessarily in-
cluding all the typical features of spontaneous spoken discourse (Taylor
1999, Chaume 2004b; Pavesi 2005) makes them a peculiar conversa-
tional domain and a variety of its own.
As for the degree of spontaneity present in movie language, some
scholars base their judgment on intuition, while others report defi-
nite instances of spontaneity. However, this evidence is not founded
on a systematic, quantifiable investigation, but is either qualitative, or
based on scripts, which have been shown to be inadequate represen-
tations of movie conversation. It seems clear that the status of movie
language needs clarification on the basis of empirical evidence.
The Multi-Dimensional Analysis presented here offers a docu-
mented analysis of movie dialogue. Through Biber’s methodology, it
is possible to demonstrate empirically the similarity or difference be-
tween movie and face-to-face conversation.
39
1.4 Biber’s Multi-Dimensional Analysis
Multi-Dimensional (MD) Analysis is a statistical approach imple-

mented by Biber (1988) for linguistic purposes. It applies multivariate
statistical techniques, such as factor analysis, to identify groups of lin-
guistic features that co-occur frequently in texts and thus determine
register variation. In factor analysis, a large number of original vari-
ables (the linguistic features, in this case) are identified and reduced
to a small set of derived variables called Factors: the variables that are
distributed in similar ways are grouped together so as to identify lin-
guistic correlations with minimum loss of information. Factors are, in
other words, linear combinations of the original variables, derived from
a correlation matrix of all the variables, which represent a group of
linguistic features. Each Factor is then interpreted functionally as a
Dimension of variation through a calculation of the communicative
functions most widely shared by the linguistic features. Biber’s (1988)
approach is based on the assumption that frequently co-occurring lin-
guistic features in texts share at least one communicative function, and
that it is possible to identify a unified Dimension underlying each set
of co-occurring linguistic features:
In the interpretation of a factor, an underlying functional dimension is sought

to explain the co-occurrence pattern among features identified by the fac-
tor. That is, it is claimed that a cluster of features co-occur frequently in
texts because they are serving some common function in those texts (Biber
1988: 91).
Another important aspect of Biber’s approach is the interpretation of

the complementary relationship between positive and negative scores.
Each Factor can have features with salient positive or negative weights;
the features characterized by negative weights co-occur, as do those
with positive weights (Biber 1988). For this reason, the absence or pres-
ence of some linguistic features is to be expected according to the cor-
related absence or presence of other features.
40
The Dimensions usually considered in Multi-Dimensional Analysis
are represented by the following five Factors10 (Biber 1988):
– Factor 1, which represents the informational (negative) vs. involved
(positive) production dimension (Dimension 1), marks “high infor-
mational density and exact informational content versus affective,
interactional, and generalized content” (Biber 1988: 107). Factor
1 is determined by two main parameters: the primary purpose of
the writer / speaker, which can be either informational or interac-
tive, affective, and involved; and the production circumstances,
which can be characterized by either careful editing, precision in
lexical choices and an integrated textual structure, or by general-
ized lexical choices and fragmented presentation of information.
– Factor 2, which represents the narrative (positive) vs. non-narrative
concerns (negative) dimension (Dimension 2), “can be considered as
distinguishing narrative discourse from other types of discourse”
(Biber 1988: 109). More specifically, narrative concerns are marked
by the presence of past time, third person animate referents, re-
ported speech, and details, whereas non-narrative concerns are
marked by immediate time and attributive nominal elaboration.
– Factor 3, which represents the explicit (positive) vs. situation-depend-
ent (negative) reference dimension (Dimension 3), distinguishes “be-
tween highly explicit, context-independent reference and non-spe-
cific, situation-dependent reference” (Biber 1988: 110). Wh-relative
clauses, for instance, specify the identity referents explicitly, whereas
time and place adverbials are dependent on referential inferences
(Biber 1988: 110).
10 Biber (1988 and 1995) and Conrad and Biber (2001) also consider two other
Factors (Factor 6 and 7), namely, the dimensions regarding “On-line Informa-
tional Elaboration Marking Stance” and “Academic Hedging”. These two fac-
tors (i. e. Dimensions 6 and 7 respectively) are not taken into account here for
they are considered still tentative by the literature given the difficulty of their
interpretation (cf. Conrad and Biber 2001: 39). It is worth noting, however,
that face-to-face conversation is usually unmarked in the use of the features
associated with these Dimensions.
41
– Factor 4, which represents the overt expression of persuasion (posi-
tive) dimension (Dimension 4), “marks the degree to which persua-
sion is marked overtly” (Biber 1988: 111). Biber holds that predic-
tion, necessity, possibility modals, together with infinitives, conditional
subordination, suasive verbs, and split auxiliaries mark persuasion.
– Factor 5, which represents the abstract (positive) vs. non-abstract
(negative) information dimension (Dimension 5), “seems to mark in-
formational discourse that is abstract, technical, and formal versus
other types of discourse” (Biber 1988: 113). The use of conjuncts,
agentless passive verbs, by-passives, passive postnominal modifiers, in-
ter alia, have positive weights on this factor.
These factors are considered Dimensions in that they define “continuums
of variation rather than discrete poles” (Biber 1988: 9). This means
that Multi-Dimensional Analysis describes texts that are to be inter-
preted as more or less formal, narrative, explicit, etc., rather than either
formal or non-formal, narrative or non-narrative, explicit or situation-
dependent, etc. This approximation is clarified by the following texts
taken from Biber (1988: 10–12). Text 1 is an example of conversation,
while Text 2 is an example of scientific exposition. These contrasting
types might seem to be entirely different, because conversation involves
ordinary language, which is unplanned and interactive, whereas scien-
interactive, but by looking at Text 3, an example of a panel discussion, it is
tific exposition is specialized, planned and non-interactive, but by look-
evident
ingthat these
at Text 3, parameters
an example do of anot define
panel clear-cut
discussion, it isdimensions,
evident thatbut rather
these
parameters
indicate do not. define
a continuum clear-cut Text
Consequently, dimensions,
1 can but be rather indicate
described as aless
continuum. Consequently, Text 1 can be described as less specialized,
specialized, less planned and more interactive than Text 2, and Text 3 as lying
less planned and more interactive than Text 2, and Text 3 as lying be-
between the two.
tween the two.
Text 1. Conversation (Biber

Text 1988: 10). (Biber 1988:10
1. Conversation
42
Text 2. Scientific exposition (Biber 1988:
Text 2. Scientific 10).(Biber 1988:10
exposition
Text 2. Scientific exposition (Biber 1988:10
Text 3. Panel discussion (Biber 1988: 11–12).

Text 3. Panel discussion (Biber 1988:11-12)
Biber’s (1988) Multi-Dimensional analysis is an important milestone in

43
factor analysis implemented in language research, which brought several
advantages with it. First and foremost, it is reliable: apart from Biber’s own
work on register variation in speech and writing (1988 and 2006) and in cross-
Biber’s (1988) Multi-Dimensional Analysis is an important milestone
in factor analysis implemented in language research, which brought
several advantages with it. First and foremost, it is reliable: apart from
Biber’s own work on register variation in speech and writing (1988
and 2006) and in cross-linguistic comparison (Biber 1995), a large
number of experiments have verified its reliability. Biber and Finegan
(2001a) and Atkinson (2001) have respectively focused on the histori-
cal evolution of register by analyzing diachronic relations among
speech-based and written registers, and scientific discourse across his-
tory, whereas Reppen (2001a) has analyzed register variation in speech
and writing. Furthermore, Biber and Finegan (2001b) and Conrad
(2001) have investigated specialized domains: the former have analyzed
medical research articles and the latter textbooks and journal articles
in biology and history (Conrad 2001). Helt (2001), Rey (2001) and
Quaglio (2009), instead, have studied dialect variation by comparing
British and American spoken English, male and female language in
American television series, and TV series conversation and face-to-face
conversation, respectively.
Secondly, as pointed out above, Biber’s (1988) approach has useful
applications, such as predicting the extent to which two linguistic fea-
tures vary when they occur together in texts:
A large negative correlation indicates that two features co-vary in a system-

atic, complementary fashion, i. e. the presence of the one is highly associated
with the absence of the other. A large positive correlation indicates that the
two features systematically occur together (Biber 1988: 79).
This means that if the factor analysis of a corpus reveals, for example,
that the occurrence of first person pronouns in a text is high, it can
then be expected that questions will occur to a similar extent; con-
versely, when first person pronouns are absent from a text, it is likely
that questions are absent too (Biber 1988: 80). Interestingly, research
has also led to the hypothesis of the existence of universal dimensions
of register variation, in that some dimensions seem to occur across lan-
guages and across general and restricted discourse domains (Biber 2004:
17).
44
Last but not least, through factor analysis, Biber’s (1988) Multi-
Dimensional approach provides both quantitative and qualitative
methods which empirically identify and interpret patterns which co-
occur as underlying dimensions of variation. It is useful to note that
this approach also works on small proportions of corpora: even when
corpora are split into small parts and then investigated, factor analy-
sis provides nearly the same dimensions of variation, as long as the
samples of the corpora include an equivalent range of register varia-
tion (Biber 2004: 16).
45
46
Chapter 2
The Making of: Methodology and Data
2.1 Introduction
The present study advocates a data-oriented description of language

through computer analyses, and investigates meaning as function in
context. In corpus investigations, one of the very first decisions is to
determine the unit of analysis. This unit is typically one of two kinds:
either a single text, if the goal of the research is to distinguish a type of
text from a group of texts, or a linguistic feature, if the research aims to
describe single linguistic features in texts. In the first case, each “obser-
vation” (i. e. the unit of analysis) is a text, whereas in the second, each
observation is an occurrence of the structure in question (Biber, Conrad,
and Reppen 1998: 269). This book uses both kinds of units of analysis:
the text, or rather the texts from American face-to-face and movie con-
versation, and the linguistic features characterizing them.
To measure the observation and to carry out a number of quanti-
tative and qualitative analyses, it is necessary to code a large sample
of constructions and to consider each occurrence as a separate obser-
vation (Biber, Conrad, and Reppen 1998). In order to do so, a cor-
pus and a software program are required. The former contains, for in-
stance, the occurrences of the linguistic feature under examination and
consequent information on its frequency and pragmatic functions,
while the latter helps retrieve them. The data in this book come from
the Longman Spoken American Corpus and the American Movie Cor-
pus, representing face-to-face and movie conversation respectively, ex-
tracted with the Biber grammatical tagger, the SAS software package,
MonoConc Pro Version 2.0 (published by Athelstan), and Oxford
Wordsmith Tools 4.0 (developed by Scott 1988).
47
The next sections provide methodological information about Biber’s
(1988) approach, the compilation of the American Movie Corpus, cre-
ated to explore movie language, and the Longman Spoken American
Corpus, used to investigate face-to-face conversation.
2.2 Methodological Steps
Two distinct analyses are presented here: at a macro level, Biber’s (1988)
Multi-Dimensional Analysis is adopted to determine a) the text type
to which movie language belongs, and b) the co-occurrence relations
among linguistic features in face-to-face and movie conversation. At
a micro level, instead, the occurrences of specific linguistic features such
as single words, lexical bundles, and multi-word sequences and pat-
tern types are explored, using corpus-driven criteria (Francis 1993,
Tognini-Bonelli 2001, Biber 2009).
In Multi-Dimensional Analyses, the co-occurrence patterns, which
are dimensions of variation underlying the text, are identified first
quantitatively using factor analyses, and then qualitatively, interpret-
ing their function. In order to apply this approach illustrated in Chap-
ter 1, the following eight methodological steps, suggested by Biber
(1988, 1995, 2004), were followed:
1. The corpus design, collection, transcription, and input into the com-
puter: the Longman Spoken American Corpus and the American
Movie Corpus, collected in electronic format and transcribed ac-
cording to corpus building criteria (cf. Sections 2.3 and 2.4), are
the sources of the data;
2. Identification of the linguistic features and of their functional asso-
ciations to be included in the analysis: the linguistic features and
their functional associations which are identified by Biber (1988)
and are associated to Factors 1–5 and their corresponding Dimen-
sions illustrated in Chapter 1 are considered;
48
3/4. Development of computer software programs which tag all relevant
linguistic features in the corpus and automatic tagging of the corpus
and editing of the texts to check whether the linguistic features are
accurately identified: both the corpora used were kindly tagged and
processed for Multi-Dimensional Analyses by Douglas Biber with
the Biber grammatical tagger he developed;
5. Counting of each linguistic feature in each text of the corpus via ad-
ditional computer programs: both the corpora were processed by
Douglas Biber with the SAS software package for statistical analy-
ses he adapted for linguistic studies; with the aid of the SAS soft-
ware package, the grammatical features mentioned in point (4)
were turned into the underlying Dimensions characterizing the two
conversational domains investigated here. The software programs
MonoConc Pro Version 2.0 (published by Athelstan) and Oxford
Wordsmith Tools 4.0 (developed by Scott 1988) were also used to
explore the occurrences of some linguistic features11;
6/7. Factor analysis of the co-occurrence patterns among linguistic features
and functional interpretation of the factors as underlying dimensions
of variation: Factors 1–5 were identified, first, quantitatively via
factor analyses, then, qualitatively via functional interpretations;
11 Both programs offer similar features, such as the ability to generate wordlists,
concordances and collocations, and they both can handle large tagged or
untagged corpora. The reason for using two software programs which can per-
form similar tasks was to compensate for their individual limits: MonoConc
Pro, for instance, can split the screen display and expand the context of the
node by highlighting the line in a more user-friendly way than Wordsmith Tools
4.0, while Wordsmith Tools 4.0 provides useful plots which give information
about the distribution of an occurrence in a single text or across texts, and
cluster information (cf. Reppen 2001b on the differences between the two
software programs). In particular, WordList, Concord, and Plot were used from
Wordsmith Tools 4.0: the first created lists of the single words or word-clusters
in the texts, set out in alphabetical or frequency order; the second retrieved
words or phrases in context to see the company they keep; and the third pro-
vided information about the distribution of an occurrence in a single text or
across texts (Scott 1998, Scott and Tribble 2006).
49
8. Computing of the dimension scores for each text; comparison of the
mean dimension scores for each register to analyze the salient lin-
guistic similarities and differences among the registers being studied:
the scores that emerged from the Multi-Dimensional Analysis
were computed and compared to analyze the salient linguistic
similarities and differences between face-to-face and movie con-
versation.
In corpus-driven analyses, the specific linguistic features are identified
both quantitatively and qualitatively. The quantitative analyses pro-
vide the frequency of the occurrences of such features, whereas the
qualitative analyses investigate functions in context to highlight func-
tional similarities and/or differences in face-to-face and movie con-
versation. When the occurrences of the items analyzed are too numer-
ous, the analyses are performed on a sample selection of the data and/
or via hypothesis testing, following the suggestions of Sinclair (1999),
identified by Hunston (2002: 52):
Sinclair (1999) advocates selecting 30 random lines, and noting the patterns
in them, then selecting a different 30, noting the new patterns, then another
30 and so on, until further selections of 30 lines no longer yield anything
new. An adaptation of this method is ‘hypothesis testing’, in which a small
selection of lines is used as a basis for a set of hypotheses about patterns. Other
searches are then employed to test those hypotheses and form new ones.
The reasons for following Biber’s (1988) methodology have been stated
in Section 1.4 – i. e. it is reliable, it can predict linguistic features in
texts and can offer both quantitative and qualitative analyses. The rea-
sons for opting for authentic-data oriented analyses, instead, include
the facts that:
– they allow for descriptions of naturally occurring combinations of
words as opposed to the limiting and sometimes deviating traits
of intuition (cf. Sinclair 1991, 2006; Stubbs 2001; Börjas 2006;
Johansson 2007; Svartvik 2007);
– they allow for both quantitative and qualitative research (cf. Biber,
Conrad, and Reppen 1998; Aarts 2001);
50
– they are empirical and lead researchers toward descriptivism as op-
posed to prescriptivism (cf. Halliday 2003c; Biber, Conrad, and
Reppen 1998; Sinclair 2006);
– they can be collected in electronic format in large databases (i. e.
corpora) and can, consequently, be easily and quickly processed,
replicated, and shared (cf. Sinclair 1991, 2004a; Stubbs 2001;
Wynne 2004; Börjas 2006).
2.3 The Longman Spoken American Corpus (LSAC)
The Longman Spoken American Corpus (henceforth LSAC), which is

taken from the Longman Spoken and Written English Corpus12 and be-
longs, together with the Longman Written American Corpus, to the
Longman Corpus Network13, is the source of data used here to investi-
gate and represent contemporary American face-to-face conversation.
The LSAC consists of five million words taken from at least four hours
of the daily conversations of American speakers from all regions of the
US, chosen as representative of gender, age, ethnicity, and education.
The conversations, which took place over periods of at least four days,
were recorded as unobtrusively as possible by project workers with tape
recorders and were subsequently edited to eliminate silences and gar-
bled material and then transcribed. So as to guarantee anonymity, any
names, addresses, and phone numbers that were mentioned during the
recordings were not transcribed, even though records of the details were
kept (Stern 2005).
12 I was kindly given access to the LSAC by Douglas Biber and Randi Reppen.
13 In particular, the Longman Spoken American Corpus is owned by Pearson Edu-
cation and was gathered by Professor Jack Du Bois and his team at the Univer-
sity of California at Santa Barbara (UCSB).
51
2.4 The American Movie Corpus (AMC)
The American Movie Corpus (henceforth AMC) is a corpus I specifi-

cally developed according to the criteria outlined in Sections 2.4.1 and
2.4.2 for the study of American movie language and dubbing into
Italian. In technical terms, it is a sample parallel bilingual corpus. It
can be described as a sample, in that it does not claim to be repre-
sentative of the whole variety under examination, i. e. movie language
and dubbing, but rather aims to provide a representative snapshot of
it. It is parallel and bilingual because it is made up of original texts
(original American movies) which are parallel to their translated ver-
sions (the relative dubbed Italian movies). In total, it consists of
204,636 words14, which amounts to nearly 44 hours of movie speech
in both American English and Italian. Since my present research does
not consider dubbing, only the original versions of American movies
are analyzed.
There were two reasons for building up a movie corpus: first, in
spite of the relatively large amount of available spoken American Eng-
lish corpora (cf. the American National Corpus, ANC; the corpus of
contemporary American English, COCA; the Michigan Corpus of
Academic Spoken English, MICASE; the Bank of English, the Santa
Barbara Corpus, etc.), to my knowledge, no corpus provided appro-
priate material for the language analysis of American movies; second,
as demonstrated in Chapter 1, the scripts which are easily accessible
and freely downloadable from the web turned out to be inappropriate
for this kind of investigation, in that their transcriptions of speech dif-
fer considerably from movie dialogues. Faced with this evidence, the
decision was taken to manually transcribe the original and dubbed Ital-
ian versions of the following American movies: Mission: Impossible II,
or M:I2 (John Woo 2000); Erin Brockovich (Steven Soderbergh 2000);
14 More specifically the original English component is made up of 104,530 words,

and the dubbed one consists of 100,106 words.
52
Me, Myself & Irene (Bobby and Peter Farrelly 2000); Meet the Parents
(Jay Roach 2000); Finding Forrester (Gus Van Sant 2000); Shallow Hal
(Bobby and Peter Farrelly 2001); Ocean’s Eleven (Steven Soderbergh
2001); One Hour Photo (Mark Romanek 2002); The Matrix Reloaded
(Andy and Larry Wachowsky 2003); Catwoman (Pitof 2004), and The
Devil Wears Prada (David Frankel 2006).
2.4.1 Building Criteria and Norming
In principle, any collection of texts can be called a corpus (corpus be-

ing Latin for body), hence a corpus is any body of texts (cf. McEnery
and Wilson 1996); however, in corpus linguistics, a corpus is not a
mere collection of texts, but texts that are collected specifically for lin-
guistic analysis, according to precise compilation criteria (Renouf
1997). Sinclair (1991), McEnery and Wilson (1996) and Kennedy
(1998), inter alia, suggest four fundamental characteristics of modern
corpora, which influenced the design of the AMC: sampling and rep-
resentativeness (and consequent balance), standard reference, finite size,
and machine-readable format.
As many scholars point out (cf. Sinclair 2004b, Renouf 1997), one
of the limits of corpus studies is that no corpus, regardless of its size
or design, can precisely reflect and capture the language as a whole
and accurately represent it. It is nevertheless feasible to build up a sam-
ple corpus which provides an acceptable view of the tendencies of the
language population one wishes to study by limiting the population
itself (cf. Biber 1993). “The notions of representativeness and balance
are, […] in the final analysis, matters of judgment and can only be
approximate”; indeed, “generalizations are an essential part of science”
(Kennedy 1998: 62).
Although any movie could be said to represent movie language, in
an empirical comparative study, in which the data must be controllable
as well as comparable, variables need to be limited. As a consequence,
since the interest of the present research is the investigation of spe-
cific features of contemporary American movie conversation compared
53
to American face-to-face conversation, the movies selected had to sat-
isfy certain parameters. To be selected for the AMC, movies had to:
(a) be produced in the United States from 2000 on;
(b) be acted / spoken mostly in American English;
(c) not be set in previous centuries and eras;
(d) have ordinary life settings.
Parameters (a) and (b) determine the kind of domain and variety un-
der examination, i. e. American movie language. Parameter (b) also re-
flects the idea of dialogue in action, that is to say, dialogue had to be
present in the movie selected (e. g. narrated movies, documentaries and,
of course, silent movies had to be excluded). Parameter (a), together
with (c), guarantees the contemporaneity of the movies and param-
eter (d) ensures that the language spoken in the movies selected is the
ordinary language of ordinary people: specialized language, for exam-
ple, is not the focus of the present study; consequently, movies con-
taining a high proportion of political debates, academic speeches, le-
gal language and other specific domains were not included. It is worth
noting that, even though some of the characters of the movies selected
have extraordinary powers, they are humans who lead ordinary lives
– Neo in The Matrix Reloaded, for instance, is a man who works in
ordinary information technology and is not a robot or a machine.
Finally, the AMC movies were categorized as belonging to genres
of comedy, non-comedy (i. e. drama, sci-fi, thriller, action, adventure,
crime, fantasy movies) or borderline (when the categorization could
not be clear-cut). On the one hand, including different genres satis-
fied the parameters of balance and representativeness, which require
the full range of linguistic variation existing in the language (Biber
1993, Kennedy 1998). On the other, this provided an opportunity
to see whether genre variation influences the frequency of the spoken
devices analyzed. Movie genre is, in fact, a complex issue: it can be
seen in Table 1 that different reference works categorize movies in dif-
ferent ways. It is also interesting to note that movies usually do not
belong to only one genre. Accordingly, the AMC components are de-
fined along a comedy/non-comedy continuum, which takes into ac-
54
count the categorization provided by Morandini et al. (2006) and by
the Internet Movie Database15 (henceforth IMDB). Table 1 illustrates
the AMC components grouped according to their genre: four movies
are considered to be 100 % comedies, which is in tune with both
Morandini et al. (2006) and the IMDB classifications. Similarly, four
others are considered to be 100 % non-comedies. Three movies are
labeled as not genre specific, since the two reference works selected
characterize them in different ways. Due to this ill-defined status, these
movies are referred to as borderline movies in the analysis.
Table 1. AMC movies and genres.

MOVIE IMDB Morandini (2006) CONTINUUM
Shallow Hal Comedy / Drama / Romance Comedy
Meet the Parents Comedy Comedy
100% COMEDY
Me, Myself & Irene Comedy Comic
The Devil Wears Prada Comedy / Drama Comedy / Drama
Ocean’s Eleven Comedy Thriller
50% COMEDY
Erin Brockovich Biography / Drama Comedy 50% NON- COMEDY
Finding Forrester Drama Comedy / Drama
One Hour Photo Drama / Mystery / Thriller Drama
Catwoman Action / Crime / Fantasy Sci-Fi 100% NON-
Mission: Impossible 2 Action / Adventure / Thriller Adventure COMEDY
The Matrix Reloaded Action / Sci-Fi / Thriller Sci-Fi
The idea behind building the movie corpus is to set the foundation
. CHAPTER 2, Page 56:
stones of a large, standard reference corpus for future studies of the
: the raw frequency count is divided by the total number of words in the text, and
movie language it represents (cf. Section 2.4.2). However, since a fi-
then multiplied by whatever basis is chosen for norming. Here, the basis I chose for
nite number of words are often determined at the beginning of a cor-
norming is 1,000, so the raw frequency count was divided by the total number of words in
pus-building project (Sinclair 1991, McEnery and Wilson 1996), a
the text, and was then multiplied by 1,000. WRONG!
finite Change
number of movies were selected to start with. As stated above
into: the raw frequency count is multiplied by whatever basis is chosen
(cf. Section 2.4), the transcriptions of the eleven American movies,
for norming, and then divided by the total number of words. The basis I chose for
together
norming is with
1,000, their
so the dubbed Italian
raw frequency countversions, make
was multiplied up a 204,636-word
by 1,000, and was then
corpus (nearly 44 hours of movie conversation).
divided by the total number of words in the text. Undoubtedly, by some
standards, the AMC is a small corpus. Sinclair (2004a) holds that big
corpora
. CHAPTER are usually
2, Page 58: favored for linguistic research due to the fact that
they
.Tableare
3: I'vmore likely
change into I've to include regularities of language. However, as
. CHAPTER
15 3, Page 64:
<http://www.imdb.com/>.
there is an extra 21 in note 21 after a measure of the spread of the distribution
55
. CHAPTER 3, Pages 67, 68:
In particular, as Table 6 demonstrates (cf. bold), both face-to-face and movie conversation present
the highest frequency in verbs (117.23 vs. 118.21 in the LSAC); a relatively high frequency of first person
pronouns and possessives (72.33 vs. 6 .80 in the LSAC),second person pronouns and possessives ( 3.36
Kennedy (1998: 68) points out, “a huge corpus does not necessarily
‘represent’ a language or a variety of a language any better than a
smaller corpus”; indeed, everything depends on the patterns present
in the corpus and the consequent generalizations that can be made
about them.
Another issue relevant to the size of the AMC is comparability: ide-
ally, two (or more) corpora which are compared should be of the same
size; however, if they are not, data comparison is still feasible by means
of norming the numbers. As Biber, Conrad, and Reppen (1998: 263)
report, it is always possible to adjust data via normalization when cor-
pora are not of the same length, and raw frequency counts are not
directly comparable. Texts of different length can thus be compared
accurately: the raw frequency count is multiplied by whatever basis is
chosen for norming, and then divided by the total number of words.
The basis I chose for norming is 1,000, so the raw frequency count
was multiplied by 1,000, and was then divided by the total number
of words in the text.
The last factor taken into account in compiling the corpus is linked
to its format: the manually transcribed movie dialogues were stored
in Text, Word, and Excel files (.txt, .doc, and .xls respectively). A ma-
chine-readable format favors both quantitative and qualitative data
studies; it allows for computerized storing, quick searching and ma-
nipulating of data, which can be easily enriched with extra informa-
tion. Furthermore, the machine-readable format allows for objectiv-
ity and replicability of the studies, and ensures that information can
be exchanged within the scientific community, and the data re-used.
The .txt, .doc, and .xls formats were chosen to facilitate data process-
ing and information retrieval. In particular, the .txt files were neces-
sary for the Wordsmith Tools 4.0 software (used for concordances and
frequency counting), whereas the .doc and the .xls files were compiled
to include extra information such as the name of speakers, setting and
relevant extra-linguistic features, which are not included in the
Wordsmith Tools 4.0 analyses.
56
2.4.2 Transcription Criteria and Tagging
As mentioned above, one of the main advantages of computerized data

is that they can be replicated, which means that successive studies on
the same data will not need to re-computerize the information. How-
ever, to ensure possible interchange of information and reusability of
the resources within the scientific community, it is important that cor-
pora are built according to standard data-storing and representative-
ness criteria (Johansson 1993).
For the AMC annotation, a verbatim record of what is actually said
in the movies was transcribed orthographically. The speaker identifi-
cation and orthographic transcription process of the AMC followed
some of the international standards for transcribing spoken corpora,
provided by the Linguistic Data Consortium16 (LDC). These conven-
tions are summarized in Tables 2 and 317.
Table 2. Speaker identification and orthographic transcription conventions used in

compiling the AMC.
Speaker At the beginning of each transcript, the speaker is given a unique identifier if
Identification the name is not present. In this case the speaker's gender is also indicated.
Capitalization is used as an aid for human comprehension of the text. The
Capitalization accepted standard way to capitalize words, including words at the beginning
of a sentence, proper names, and so on are followed.
When abbreviations are used as part of a personal title, they can remain as
abbreviations:
Mr. Brown
Mrs. Jones
Dr. Spock
Abbreviations However, when they are used in any other context, they are written out in
full, e. g.:
I went to the junior league game.
I'm going home to see the missus
I went to the doctor, and all he said was, don’t worry, it’s natural.
Hey mister, do you know how to get to the stadium?
Contractions and Table 3 below illustrates what is considered standard written English with
Apostrophe -s respect to contractions.
16 <http://www.ldc.upenn.edu/About>.
17 Cf. also <http://projects.ldc.upenn.edu/SBCSAE/transcription/csae-conventions.
html#ortho>.
57
Table 3. Transcription conventions used for the AMC compilation
Table 3. Transcription conventions used for the AMC compilation for contractions
for contractions and apostrophe s18
and apostrophe s18.
Complete words Contraction allowed

I have I’e
I’ve
Cannot can’t
will not won’t
you have you’ve
could not couldn’t
we will we’ll
should have should’ve
it is it’s
Marvin – possessive Marvin’s
going to Gonna
want to Wanna
she is she’s
Marvin is Marvin’s
Marvin has Marvin’s
The The
orthographic
orthographic transcriptions
transcriptionswere
werethen
thenchecked
checkedforforaccuracy
accuracybyby
nativespeakers
native speakers ofof English
English andand Italian
Italian who
who were
were notnotinvolved
involved ininthe the
transcription. There were three reasons for choosing orthographic tran-
transcription. There were three reasons for choosing orthographic transcription.
scription. First of all, it provides a representation of spoken language
First of all,
which is itsimple
provides a representation
to read of spoken
and understand: language
compared towhich is simple to
IPA transcrip-
read andfor
tion, understand:
instance, compared
orthographicto IPA transcription,isforeasier
transcription instance, orthographic
because it re-
quires less iseffort
transcription easierand knowledge
because (cf. less
it requires Halliday 1985b).
effort and Secondly,
knowledge it al-
(cf. Halliday
lows for immediate computing processes such
1985b). Secondly, it allows for immediate computing processes such as frequency andas
concordancing information retrieval: orthographic transcription is the
frequency and concordancing information retrieval: orthographic transcription is
format which concordancers usually read. Thirdly, even though ortho-
the formattranscription
graphic which concordancers usually written
is “an imperfect read. approximation
Thirdly, even though of a
orthographic transcription is “an imperfect written approximation of a speech
18 The contractions in bold are not allowed in the LDC transcription guides.
event" (Kennedy 1998:82), and cannot capture all the features of spoken
However, they were kept in the present research for two main reasons: firstly,
conversation
because(Halliday 1985b,
they reflect what isWichmann 2007),
actually said in theitmovies;
forms secondly,
the basisbecause
for all they
other
are also present in the Longman Spoken American Corpus, which is the corpus
18
The contractions
used for thein present
bold are comparative
not allowed instudy.
the LDC transcription guides. However, they were
kept in the present research for two main reasons: firstly, because they reflect what is actually
said in the movies; secondly, because they are also present in the Longman Spoken American
58 , which is the corpus used for the present comparative study.
Corpus
speech event” (Kennedy 1998: 82), and cannot capture all the features
of spoken conversation
transcriptions (Halliday
and annotations. This1985b,
means Wichmann 2007),
that the corpora it forms
could easily be
the basis for all other transcriptions and annotations. This
tagged, and thus enriched with extra information, so as to be processedmeans that
by the
the corpora could easily be tagged, and thus enriched with
SAS software package used in Multi-Dimensional analysis. Both the LSAC and
extra in-
formation, so as to be processed by the SAS software package used in
the AMC were tagged with the Biber grammatical tagger: Extract 5 illustrates a
Multi-Dimensional Analysis. Both the LSAC and the AMC were
tagged sentence (I know we haven’t been
tagged with the Biber grammatical tagger:together
Extractthat long, but these
5 illustrates a taggedlast ten
sentence
months have(I know we haven’t
just been been oftogether
the happiest my life)that
fromlong, but these last ten
the AMC.
months have just been the happiest of my life) from the AMC.
Extract 5. An example of tagged AMC
Extract 5. An example of tagged AMC.
As Extract 5 shows, each tagged corpus has a vertical format which

displaysAseach
Extract
word5 shows, each tagged
on a separate line, a corpus has athe
space after vertical
word,format
and thewhich
tag beginning
displays with on
each word ^. This way ofline,
a separate tagging includes
a space after more information
the word, and the tag
about certain words (consider been,
beginning with ^. This way of tagging includes more information about la-
for example, which has four certain
bels) 19, and the + mark separates the different fields of information
19
words (consider been, for example, which has four labels) , and the + mark
for each word.
separates the different fields of information for each word.
19 For further details see Biber, Conrad and Reppen (1998: 259)
59
19
For further details see Biber, Conrad and Reppen (1998:259)
60
Chapter 3
Shot 1: Multi-Dimensional Analysis
of Face-to-Face and Movie Conversation
3.1 Introduction
In this chapter, Multi-Dimensional Analyses verify the extent to which

face-to-face and movie conversation differ or resemble each other in
terms of the linguistic features which characterize spoken discourse.
As pointed out in Chapter 1, it has been said that these two domains
differ in terms of spontaneity: face-to-face conversation, which is to-
tally natural and spontaneous, is described as the “quintessence of the
spoken language”, whereas movie conversation, which is artificial and
non-spontaneous, is considered “not likely to be representative of the
general usage of conversation” (Sinclair 2004b: 80).
Interestingly, contrary to what has been maintained for about thirty
years (cf. Chapter 1), the empirical data presented here reveal that face-
to-face and movie conversation do not differ to any great extent. The
most frequent linguistic features in both corpora, identified statisti-
cally through factor analysis (cf. Table 4), are, for example, verbs (in
particular, uninflected presents, imperatives and third persons – pres in
the tables), second person pronouns and possessives (pro2), first person pro-
nouns and possessives (pro1), nouns (n), and prepositions (prep). In addi-
tion, the least frequent features are wh-pronouns functioning as rela-
tive clauses in object position (rel_obj), as relative clauses in subject
position (rel_subj), and as relative clauses in object position with prepo-
sitional fronting (rel_pipe), suasive verbs (e. g. ask, command, insist –
sua_vb), passive verbs + by (by_pasv), and passive postnominal modifiers
(whiz_vbn). The statistics of these and other linguistic features, which
are illustrated in detail in the next sections, are given in Table 4.
61
20 20. Linguistic features of movie and face-to-face conversation.
TableTable
4 4. Linguistic features of movie and face-to-face conversation
20 The variables in all the tables are the linguistic features analyzed (for the mean-
ing of the codes of the specific features see Appendix 1). N stands for the
number of texts selected (with regard to movie conversation the number 3
refers to the three sub-genres, or sub-corpora, labeled comedies, borderline
movies, and non-comedies; with regard to face-to-face conversation 327 refers
to the number of texts included in the LSAC). Mean is the average frequency
of items. The frequency counts of all linguistic features are normalized to a text
length of 1,000 words.
62
This preliminary observation is extremely relevant as it presupposes
that the features shared by face-to-face and movie conversation also
serve similar functions (Biber 1988) and, consequently, have similar
he variablestextual
in all Dimensions
the tables (Biber,
are theConrad
linguistic
and features analyzed
Reppen 1998). Table(for the meaning of
5 clearly
es of the specific features see Appendix 1). N stands for the number of texts selected (w
illustrates this: face-to-face and movie conversation are quantitatively
ard to movie conversation the number 3 refers to the three sub-genres, or sub-corpo
and qualitatively similar since they have four Dimensions out of five
led comedies, borderline movies, and non-comedies; with regard to face-to-f
in common and display a positive score with respect to Dimension 1
versation 327 refers to the number of texts included in the LSAC). Mean is the aver
uency of items.4,The
and and frequency
a negative score
countswith
of respect to Dimension
all linguistic 2 and
features are 3. The
normalized to a t
th of 1,000 only
words.Dimension they differ on is Dimension 5.
63
have four Dimensions out of five in common and display a positive score with
respect to Dimension 1 and 4, and a negative score with respect to Dimension
2 and 3. The only Dimension they differ on is Dimension 5.
Table 521.21Multi-Dimensional Analysis of face-to-face and movie conversation.

Table 5 . Multi-Dimensional analysis of face-to-face and movie conversation
21
This means that the traits they share also serve similar functions: both
The label Variable stands for the 5 Dimensions (or Factors, i.e. dim1-5 in the table) taken into
face-to-face and movie conversation are characterized by involved pro-
account; N for the number of texts (or sub-corpora) in the two corpora considered; Mean for
the mean (average) frequency of items (the higher it is, the more frequent the items are); Std
Dev for standard deviation, namely, a measure of the spread of the distribution21; and Minimum
and21Maximum for theVariable
The label minimum and maximum
stands for the 5 frequencies
Dimensionsof(or items respectively.
Factors, Biber, in
i. e. dim1-5 Conrad
the
and Reppen (1998:280) explain that in all Multi-Dimensional studies “frequencies are
table) taken into account; N for the number of texts (or
standardized to a mean of 0.0 and a standard deviation of 1.0 before factor scores are sub-corpora) in the
computed.two corpora
This processconsidered; Mean
translates the for for
scores the all
mean (average)
features frequency
to scales of items
representing (the
standard
deviation higher
units, thus,
it is,regardless
the moreoffrequent
whether the
a feature
items isare); Std Dev
extremely rare fororstandard
extremelydeviation,
common in
absolute terms,
namely, a standard
a measure score of +1
of the represents
spread of theone standard deviation
distribution; and Minimum unit aboveandthe mean
Maxi-
score for the feature in question. That is, standardized scores measure whether a feature is
mum for the minimum and maximum frequencies of items respectively.
common or rare in a text relative to the overall average occurrence of that feature. The raw Biber,
Conrad
frequencies and Reppen
are transformed to (1998:
standard280) explain
scores that all
so that in all Multi-Dimensional
features on a factor willstud- have
equivalenties “frequencies
weights are standardized
in the computation to a mean
of Dimension of 0.0
scores. If and a standard
this process wasdeviation of
not followed,
extremely1.0common
before features wouldare
factor scores have much greater
computed. influence
This process than rare
translates thefeatures on all
scores for the
Dimension scores.”
features to scales representing standard deviation units, thus, regardless of
whether a feature is extremely rare or extremely common in absolute terms, a
standard score of +1 represents one standard deviation unit above the mean
score for the feature in question. That is, standardized scores measure whether
a feature is common or rare in a text relative to the overall average occurrence
of that feature. The raw frequencies are transformed to standard scores so that
all features on a factor will have equivalent weights in the computation of
Dimension scores. If this process was not followed, extremely common fea-
tures would have much greater influence than rare features on the Dimension
scores.”
64
This means that the traits they share also serve similar functions: both
face-to-face and movie conversation are characterized by involved production,
duction, non-narrative
non-narrative concerns, concerns, situation-dependent
situation-dependent reference,
reference, and and aoflow
a low level
level of persuasion
22 22 .
persuasion .
TheseThese findings, which
findings, areillustrated
which are illustrated in Chart
in Chart 1 examined
1 and are and are examined
in detail
in indetail
the next sections, offer a new perspective on movie language by bringing tolan-
in the next sections, offer a new perspective on movie
guage
light by bringing
its similarity toface-to-face
with light its similarity with face-to-face conversation.
conversation.
Chart 1.Chart
Multi-Dimensional Analysis of face-to-face and movie conversation.
1. Multi-Dimensional analysis of face-to-face and movie conversation
22
As illustrated in Chapter 1, Factor 1 (dim1 in Table 5) displays informational versus
22 involved
As illustrated
production, in Chapter
namely, 1, Factor
a Dimension 1 (dim1
which in Table
marks “high 5) displays
informational densityinformational
and exact
informational content versus affective, interactional, and generalized content” (Biber 1988:107).
versus involved production, namely, a Dimension which marks “high
Factor 2 (dim2) represents narrative versus non-narrative concerns, a Dimension which “can be
informa-
tional density and exact informational content versus affective,
considered as distinguishing narrative discourse from other types of discourse” (Biber interactional,
1988:109). Factor 3 (dim3)
and generalized concerns
content” explicit1988:
(Biber versus 107).
situation-dependent reference
Factor 2 (dim2) , a Dimension
represents narra-
which distinguishes “between highly explicit, context-independent reference and nonspecific,
tive versus non-narrative concerns,
situation-dependent reference” (Biber 1988:110). Factor 4 (dim4), the only Dimension which is as
a Dimension which “can be considered
distinguishing
exclusively narrative
positive, reveals discourse
the overt from
expression other types
of persuasion of discourse”
, a Dimension which (Biber 1988:
“marks the
degree to which persuasion is marked overtly” (Biber 1988:111). Finally, Factor 5 (dim5)
109). Factor 3 (dim3) concerns explicit versus situation-dependent
reflects abstract versus non-abstract information, a Dimension which “seems to mark
reference, a
Dimension which distinguishes “between highly explicit, context-independent
informational discourse that is abstract, technical, and formal versus other types of discourse”
(Biber 1988:113).
reference and nonspecific, situation-dependent reference” (Biber 1988: 110).
Factor 4 (dim4), the only Dimension which is exclusively positive, reveals the
overt expression of persuasion, a Dimension which “marks the degree to which
persuasion is marked overtly” (Biber 1988: 111). Finally, Factor 5 (dim5) re-
flects abstract versus non-abstract information, a Dimension which “seems to
mark informational discourse that is abstract, technical, and formal versus other
types of discourse” (Biber 1988: 113).
65
3.2 Informational vs. Involved Production (Dimension 1)
The linguistic features which characterize the Dimensions have either

a negative or a positive weight on the Dimensions themselves and can
be understood in terms of low and high frequency. In Dimension 1,
for example, which reflects informational (negative) vs. involved (posi-
tive) production (cf. Biber 1988), features such as nouns, prepositional
phrases, attributive adjectives, word length, and type-token ratio (abbre-
viated to n, prep, adj_attr, wrdlngth, and typetokn in the tables) have a
negative weight. This implies that if these co-occurring linguistic fea-
tures are frequent in a text, they affect the negative side of the Di-
mension and, consequently, the production of the text is more infor-
mational (negative) than involved (positive). This happens because a
high frequency of nouns, the main bearers of referential meaning, is a
sign of high density of information. Prepositional phrases and attribu-
tive adjectives integrate information in a text. Word length marks a high
density of information, for longer words convey more specialized
meaning than shorter words. Finally, type-token ratio is affected by the
quantity of many different lexical items in a text: the higher type-to-
ken ratio a text displays, i. e. the greater variety of lexical words a text
has, the more specialized meanings the words present in that text have
(cf. Biber 1988: 104–105). For example, written texts, which contain
a high number of occurrences of the linguistic features just mentioned,
are informational texts that present information as concisely and pre-
cisely as possible (cf. Biber, Conrad and Reppen 1998). Conversely,
spoken texts, which are characterized by features which have a posi-
tive weight on Dimension 1 (i. e. which affect the positive side of the
dimension) are involved, rather than informational, texts. This can also
be said about movie conversation, as Table 5 and Chart 2 illustrate,
both face-to-face and movie conversation belong to the same textual
dimension: they are both characterized by a large positive mean score
reflecting a production which is positive and, consequently, involved.
66
Chart 2. Dimension 1: Informational vs. Involved production.
Chart 2. Dimension 1: Informational vs. Involved production
This textual and qualitative similarity, determined by the type of lin-

This textual and qualitative similarity, determined by the type of linguistic
guistic features that the two corpora share, is also quantitative: the
features that the two corpora share, is also quantitative: the two conversational
two conversational domains not only 23share the same polarity23, but
domains not only share the same polarity , but they also have extremely
they also have extremely similar mean scores (35.04 and 35.31, re-
similar mean scores (35.04 and 35.31, respectively) on both the Dimension and
spectively) on both the Dimension and on the single items involved.
on the single items involved. Furthermore, in both corpora, the Dimension with
Furthermore, in both corpora, the Dimension with the highest score
of the
all highest score of all is Dimension 1. The fact that both corpora are marked
is Dimension 1. The fact that both corpora are marked posi-
positively on Dimension 1 reflects a dense use of those items which have a
tively on Dimension 1 reflects a dense use of those items which have
positive weight
a positive weightonon Dimension
Dimension 1 and1which, consequently,
and which, contribute contrib-
consequently, to an
uteinterpersonal dialogic character

to an interpersonal and tocharacter
dialogic an affective,
andinteractional,
to an affective,and
generalized context
interactional, typical of spoken
and generalized contextdiscourse.
typicalInofparticular, as Table 6 In
spoken discourse.
particular,
demonstrates as (cf.
Table 6 demonstrates
bold), both face-to-face (cf.
and bold), both face-to-face
movie conversation present theand
movie
highestconversation present
frequency in verbs the highest
(117.23 vs. 118.21frequency in verbs
in the LSAC); (118.21
a relatively high vs.
117.23 respectively);
frequency a relatively
of first person pronouns high of firstvs.person
frequency(72.33
and possessives 65.80 pronouns
in the
and possessives
LSAC), second (65.80 vs. 72.33),
person pronouns second person
and possessives (53.36pronouns and
vs. 35.37 in the possessives
LSAC),
(35.37 vs. 53.36), private verbs (29.49 vs. 24.40), and it pronouns
23
The term polarity is used here to refer to the type of mean score which can be either positive
or negative.
23 The term polarity is used here to refer to the type of mean score which can be
either positive or negative.
67
(24.60 vs. 19.00); and a moderately high frequency of discourse par-
ticles (14.00 vs. 12.60), demonstrative pronouns (13.14 vs. 11.56), and
emphatic adverbs and qualifiers (e. g. just, really, so: 11.83 vs. 9.10).
Table 6. Linguistic features with a positive weight on Dimension 1 in the LSAC and in
Table 6. Linguistic features with a positive
the AMC
weight on Dimension 1 in the LSAC
and in the AMC.
LINGUISTIC FEATURES FACE-TO-FACE MOVIE

{POSITIVE DIMENSION 1} CONVERSATION CONVERSATION
Verb (uninflected present, imperative
118.21 117.23
& third person)
First person pronoun / possessive 65.80 72.33
Second person pronoun / possessive 35.37 53.36
Private Verb (e. g. believe, feel, think) 29.49 24.40
Pronoun ‘it’ 24.60 19.00
Discourse Particle (e. g. now) 14.00 12.60
Demonstrative Pronoun 13.14 11.56
Adverb / Qualifier – Emphatic
11.83 9.10
(e. g. just, really, so)
‘That’ Deletion 9.86 8.56
Coordinating conjunction – clausal connector 8.59 8.43
Modals of possibility (can, may, might, could) 8.35 8.10
Nominal Pronoun (e. g. someone, everything) 7.86 7.73
Stranded Preposition 3.31 6.56
Verb ‘Do’ 3.24 5.66
Verb ‘Be’ (uninflected present tense, verb
3.21 4.30
and auxiliary)
Wh- question 3.16 3.93
Wh- Clause 2.57 3.20
Adverbial – Hedge (e. g. almost, maybe) 2.50 2.46
Contraction 2.44 2.33
Adverb / Qualifier – Amplifier
2.39 1.53
(e. g. absolutely, entirely)
Subordinating Conjunction – Causative
2.30 1.46
(e. g. because)
68
The following mean scores and extracts illustrate the presence in both
face-to-face and movie conversation of the linguistic features which
have a positive weight on Dimension 1 and, consequently, character-
ize a discourse which is expressive of thoughts, private attitudes and
emotions and which is highly interactive. More specifically, Extracts
6a and 6b offer some examples of verb forms, tenses and types which
have very similar mean scores in both corpora: verbs (uninflected
presents, imperatives & third persons) 118.21 (LSAC) vs. 117.23 (AMC);
private verbs 29.49 (LSAC) vs. 24.40 (AMC).
Extract 6a from the LSAC:
Speaker1: Uh, yeah. I prefer some to others. I think I prefer turkey

to chicken and I really like a nice turkey that you can just, I
ripped one apart very much so the other day.
[…]
Speaker1: This is good. You threw a hell of a New Year’s Eve party.
Your house is, just looks just lovely.
Speaker2: It does.
Speaker3: Well thank you William.
Speaker1: <unclear>
Speaker2: Beautiful. Have you got name tags on everything?
Speaker1: Tomorrow.
Speaker2: Tomorrow. I still have my name tag.
Speaker1: That’s why we’ll get up at six ten. To put the name tags on.
Speaker2: Do you know what she does? She writes the names on like
a piece of paper on the bottom in tiny, tiny shorthand cause
she knows that we’re gonna see them and look at them.
Extract 6b from the AMC:
Speaker1: Of course, councillor. But might I advise a level of discre-

tion concerning specific details. We do not wish to start a
panic.
Speaker2: Quite right. A panic is not what anyone wants. What about
you, Captain, what would you advise?
Speaker3: The truth. No one will panic. Because there is nothing to
fear. That army will never reach the gates of Zion.
Speaker2: What makes you so sure?
69
Speaker3: Consider what we have seen, Councillor. Consider that in
the past 6 months we have freed more minds than in 6 years.
This attack is an act of desperation. I believe very soon the
prophecy will be fulfilled and this war will end.
[…]
Speaker1: Mrs. Larson? It uh it won’t be much longer, Mrs. Larson.
Speaker2: Oh well is he in a lot of pain?
Speaker1: No No no. There will be no more pain for your husband.
He’s heavily sedated.
Speaker2: OK I think I’m gonna go, send little Hal in now.
Speaker1: No no no I don’t think that’s such a good idea. With all the
painkillers uh the reverend’s not exactly himself.
Speaker2: Look I think my boy has a right to say goodbye to his fa-
ther I mean the man means everything in the world to him.
Extracts 7a and 7b show some examples of the pronouns found in

the corpora which are characterized by the following mean scores: first
person pronouns and possessives 65.80 (LSAC) vs. 72.33 (AMC); second
person pronouns and possessives 35.37 (LSAC) vs. 53.36 (AMC); pro-
noun it 24.60 (LSAC) vs. 19.00 (AMC); demonstrative pronouns 13.14
(LSAC) vs. 11.56 (AMC).
Speaker1: Would you like some more?

Speaker2: No thank you I’m fine.
Speaker1: How about you?
Speaker3: No thanks.
[…]
Speaker1: Are you sure you don’t want some?
Speaker2: No
Speaker1: It’s Christmas Eve. That’s when you have to have cham-
pagne.
Speaker3: »Yeah. <unclear> and Brian … I’ve got a chauffeur. You
know, he’s driving.
Speaker1: How you doin’?

Speaker2: What’s going on?
70
Speaker1: I just was in the neighborhood got off work early.
Thought maybe you wanted to get a bite to eat.
Speaker2: Oh, that’s very sweet. What a nice surprise.
Speaker1: Oh shoot I forgot to change my shoes.
Speaker2: That’s OK you don’t have to change. You know I can’t resist
a man in nurse’s shoes.
Speaker1: I know but I got sneakers in my backpack I’m just gonna
change. It’ll just take a second.
Extracts 8a and 8b give some examples of discourse particles, emphatic

adverbs and qualifiers which are present in the corpora with the fol-
lowing mean scores: discourse particles 14.00 (LSAC) vs. 12.60 (AMC);
emphatic adverbs and qualifiers 11.83 (LSAC) vs. 9.10 (AMC).
Speaker1: No I mean they would have to be his half brother.

Speaker2: So he’s, he’s their father. I mean dad’s father is their father.
Speaker1: So he had half brother and sister.
Speaker2: Oh I said step huh?
Speaker1: Yeah…
Speaker2: Well half, what do you call them? They’re half brothers.
Speaker1: Look, seriously. Miranda Priestly is a huge deal. I bet a mil-

lion girls would kill for that job.
Speaker2: Yeah, great. The thing is I’m not one of them.
Speaker3: Look, you gotta start somewhere, right? I mean, look at this
dump Nate works in. I mean, come on. Paper napkins?
Hello.
Speaker4: Yeah. And Lily, she works at that gallery doing, uh, you
know…
Oh, I’m sorry. What exactly is it that you do anyway?
Furthermore, as Table 7 demonstrates, moderately similar scores were

found for both corpora on the features which have a positive weight
on Dimension 1. First of all, they display similar high mean scores of
that deletions, coordinating conjunctions and clausal connectors, modals
71
of possibility (can, may, might, could), and nominal pronouns (e. g. some-
one, everything). Second, face-to-face and movie conversation also share
relatively low frequency of stranded prepositions, verbs do and be, wh-
questions, wh-clauses, adverbials and hedges (e. g. almost, maybe), con-
tractions, qualifiers and amplifiers (e. g. absolutely, entirely), adverbs (e. g.
qualifiers and amplifier such as absolutely and entirely), and subordinating
conjunctions (e. g. causative because).
Table 7. Linguistic features which have a positive weight on Dimension 1

Table 7. Linguistic features which have a positive weight on Dimension 1 in the LSAC
in the LSAC and AMC.
and AMC
LINGUISTIC FEATURES FACE-TO-FACE MOVIE

{POSITIVE ON DIMENSION 1} CONVERSATION CONVERSATION
‘That’ Deletion 9.86 8.56
Coordinating conjunction – clausal connector 8.59 11.56
Modals of possibility (can, may, might, could) 8.35 8.43
Nominal Pronoun (e. g. someone, everything) 7.86 8.10
Stranded Preposition 3.31 3.93
Verb ‘Do’ 3.24 4.30
Verb ‘Be’ (uninflected present tense, verb
3.21 3.20
and auxiliary)
Wh- question 3.16 5.66
Wh- Clause 2.57 2.46
Adverbial – Hedge (e. g. almost, maybe) 2.50 1.53
Contraction 2.44 6.56
Adverb / Qualifier – Amplifier
2.39 2.33
(e. g. absolutely, entirely)
Subordinating Conjunction – Causative
2.30 1.46
(e. g. because)
These items are associated with an involved (positive) Factor and

These itemstoare
contribute associated
a context that can involvedas (positive)
withbeandescribed Factor
oral, affective, and con-
fragmented,
tribute to a context that can be described as oral, affective, fragmented,
interactional, and generalized. Private verbs, for example, are used to express
interactional, and generalized. Private verbs, for example, are used to
private attitudes, emotions and thoughts, while present tense forms are
express private attitudes, emotions and thoughts, while present tense
employed to indicate actions taking place in the immediate context of the
forms are employed to indicate actions taking place in the immediate
action. These two features bear the most weight on this Dimension since they
context of the action. These two features bear the most weight on this
are indicators of a verbal rather than a nominal style (cf. Biber 1988:105). First
Dimension since they are indicators of a verbal rather than a nominal
and second person pronouns are also highly present in interactive discourse. It,
style (cf. Biber 1988: 105). First and second person pronouns are also
demonstrative and indefinite pronouns stand for unspecified nominal referents.
72Finally, discourse particles are generalized markers of information which help
maintain textual coherence (cf. Biber 1988:104-108).
Tables 8 and 9 fully illustrate the mean scores, standard deviations,
minimum and maximum scores (i.e. the means procedure) of the items
highly present in interactive discourse. It, demonstrative and indefinite
pronouns stand for unspecified nominal referents. Finally, discourse par-
ticles are generalized markers of information which help maintain tex-
tual coherence (cf. Biber 1988: 104–108).
Tables 8 and 9 fully illustrate the mean scores, standard deviations,
minimum and maximum scores (i. e. the means procedure) of the items
analyzed and also of those which have a negative weight on Dimen-
sion 1. It is not surprising that features such as nouns, prepositional
phrases, attributive adjectives, word length, and type-token ratio (respec-
tively n, prep, adj_attr, wrdlngth, and typetokn in the tables), which af-
fect the negative side of Dimension 1, have a relatively low frequency
in both corpora. As mentioned above, this implies that the produc-
tion of a text which displays a high frequency of the items just listed
is more involved (positive) than informational (negative). A low fre-
quency of nouns marks low density of information; it is worth noting
that, although the occurrence of nouns is high here (186.4), this re-
flects the norm in spoken language (cf. face-to-face conversation =
137.4 in Biber 1988: 264) and it is relatively low compared to their
frequency in written registers (cf. press reportage 220.5; press editori-
als = 201.0; press reviews = 208.3; official documents = 206.5; aca-
demic prose = 188.1; general fiction = 160.7; in Biber 1988: 247–
269). Similarly, a low frequency of prepositional phrases and attributive
adjectives indicates that information is spread thinly through the text.
Furthermore, shorter words (cf. word length) convey less specialized
meaning than longer words and low variation in vocabulary (cf. type-
token ratio) reflects the use of words that do not have very specific
meanings. These key features (nouns, prepositional phrases, attributive
adjectives, word length and type/token ratio) are marked in bold in the
tables.
73
Table 8. Means procedure of Dimension 1 in the LSAC.
Dimension 1
Table 9. Means procedure of Dimension 1 in the AMC
Table 9. Means procedure of Dimension 1 in the AMC.

Table 9. Means procedure of Dimension 1 in the AMC
Dimension 1
74
The following extracts show the frequency of these features in the two
corpora (see bold), and their frequent co-occurrence both in face-to-
face and movie conversation.
Speaker1: Did you manage?

Speaker2: Yeah.
Speaker1: Well, how, that’s very clever of you. I’ve been trying to open
one for.
Speaker2: Do you have fingernails?
Speaker1: Yeah you have fingernails. You should be able to get that
one?
Speaker2: Oh this is great.
Speaker1: Yeah, yeah, except they are not really very good quality. But
at least they are very small [so you]
Speaker2: [Yeah].
Speaker1: You don’t need much space for that.
Speaker2: Um, … maybe I’ll let you tell me how, how it opens up.
Speaker1: Uh, I think it’s just … pulling here right?
Speaker2: Uh huh.
Speaker1: And then I guess you need to just take … take them, I don’t
know if you want to take them all out or just leave them
like that. There’s a lot of <unclear>.
Speaker1: Hey Russ! Rusty. What’s up man?

Let me ask you a question now. Are you incorporated?
Roll, okay, if you are not, you should really think about it,
cos I was talking to my manager last night…
Speaker2: Bernie?
Speaker1: No, not Bernie my business manager. Actually. You know
they’re both named Bernie. Anyway, he was telling me that
because of what we do, can be considered like research. For
like a future. Gig or whatever. I can totally make it a tax
write-off, the one thing is and this is, like, just his thing,
and it’s stupid. But. I’d have to pay you by check. What?
Let’s, or we could just stick to cash. Yeah, let’s… yeah, let’s
just stick to cash.
75
3.3 Narrative vs. Non-Narrative Concerns (Dimension 2)
Face-to-face and movie conversation also display extremely similar vari-

ables on Dimension 2, which reflect narrative (positive) vs. non-nar-
rative (negative) concerns (cf. Biber 1988): they both have a negative
score (i. e. -0.84 and -0.97 respectively, cf. Chart 3). In linguistic terms,
this means that they are both non-narrative text types which mark im-
mediate time and attributive nominal elaboration.
Chart 3. Dimension 2: Narrative vs. Non-narrative concerns.
Face-to-face conversation
Face-to-face has ahaslow
conversation occurrence
a low occurrence ofof verbs inthe
verbs in perfect
theperfect
aspect
aspect (perfects in the tables), public verbs (e.g. assert, complain, say, report,re-
(perfects in the tables), public verbs (e. g. assert, complain, say,
port,declare
declare – pub_vb),
– pub_vb ), past past
tensetense
verbsverbs (pasttnse),
(pasttnse third person
and person
), and third pro-
pronouns
nouns except it (pro3) (cf. Table 10), which are all devices of narrative
except it (pro3) (cf. Table 10), which are all devices of narrative discourse and
discourse and have a positive weight on Dimension 2 (cf. Biber 1988:
have a positive weight on Dimension 2 (cf. Biber 1988:109). Indeed, past tense
109). Indeed, past tense and perfect aspect mark past events; public verbs
and perfect aspect mark past events; public verbs are used to indicate indirect,
are used to indicate indirect, reported speech; third person pronouns
reported speech; third person pronouns (except it) are used to refer to specific
(except it) are used to refer to specific animate referents described in
animate referents described in narrative discourse (cf. Biber 1988:109).
narrative discourse (cf. Biber 1988: 109).
76
Table 10. Linguistic features of Dimension 2 in the LSAC.
Dimension 2
The following extract – which is a randomly chosen example from

the LSAC – shows
The following extractthat there are
– which is ano, few, perfects,
or very chosen
randomly examplepast from
tenses,the
third person pronouns except it (the examples are marked in bold); on
LSAC – shows that there are no, or very few, perfects, past tenses, third person
the other hand, the present tense which, together with attributive ad-
pronouns jectives, it (the
excepthas examples
a negative are on
weight marked in bold); onisthe
this Dimension, mostother hand, the
frequently
present used
tense(some
which,examples
together are
withmarked in italics).
attributive These
adjectives , haselements
a negativehave a
weight
negative score on Factor 224; consequently, their high frequency in this
on this Dimension, is most frequently used (some examples are marked in
face-to-face conversation determines its negative (i. e. non-narrative)
24
italics). Dimension 225. have a negative score on Factor 2 ; consequently,
These elements
their high frequency in this face-to-face conversation determines its negative
(i.e. non-narrative) Dimension 225.
Speaker1: It’s a basement.
Speaker2: Half windows?
Extract 10a from theYeah.
Speaker1: LSAC:
Speaker2:It'sYou
Speaker1: can visit?
a basement.
Speaker1: Absolutely. I’ve got plenty of space now, boy. And a Jeep.
Speaker2:Half
Speaker2: windows?
<unclear>
Speaker1: Madge. Oh she will be in her glory.
Speaker1: Yeah.
Speaker2: It’s great.
Speaker2:
Speaker1:You Shecan visit
won’t ? to go in the elevator.
have
Speaker2: She’ll be able to get outside and outdoors.
Speaker1: Absolutely. I've got plenty of space now,
Speaker1: How long until you will be living in it?
boy. And a Jeep.
24 Speaker2:
Biber <unclear>
(1988: 109) explains this by highlighting that “a discourse typically re-
ports events in the past or deals with more immediate matters, but does not
Speaker1:
mix the two”. Madge. Oh she will be in her glory.
25 InSpeaker2:
order to highlight the interrelation of the different components within the
It's great.
dimensions, it is worth noting that these items are the same items which have
a Speaker1: She
positive weight on won't have to go1in(see
Factor/Dimension the elevator.
Section 3.2).
Speaker2: She'll be able to get outside and outdoors.
77
24
Biber (1988:109) explains this by highlighting that “a discourse typically reports events in the
past or deals with more immediate matters, but does not mix the two”.
25
In order to highlight the interrelation of the different components within the dimensions, it is
worth noting that these items are the same items which have a positive weight on
Speaker2: I’m gonna move in, in a week or so after I get back.
Speaker1: What is the room that has the wood paneling on the walls?
Speaker2: That’s the basement.
Speaker1: Oh. It’s got windows.
Speaker2: Yeah. It’s actually kind of raised. The whole thing is raised
up so it’s
Speaker1: Yeah I see steps going up.
Does it rain a lot?
Speaker2: It rains a lot in Chicago but the water, the basement doesn’t
get any water.
Speaker1: I’m sure he looked into that.
Speaker2: That was one of the things you have to check for.
<unclear> Santa Barbara.
Speaker1: Oh yes. Yeah I just found out that a friend of mine is going
to the University of Chicago to get her Ph D. I really want
to go visit her. Maybe I’ll come out and <unclear>.
Speaker2: <unclear>
Speaker1: Oh is she?
Speaker2: Yeah.
Speaker1: Oh good.
Speaker2: <unclear>
Speaker1: I understand <unclear> gonna be in nineteen ninety-four.
Speaker2: I hope we won’t get any student loans after ninety-six.
Speaker1: <unclear> stretch it out.
Speaker2: I won’t be able to <unclear> my student loans after ninety-
six. <unclear>
Speaker2: Push his arm.
Speaker1: Huh?
Speaker2: Push his arm.
Speaker1: Yeah.
Speaker2: But it’s fun anyway.
Speaker2: That’s what’s important. If you like it.
Speaker1: Yeah.
In much the same way, movie dialogues (cf. Table 12) display very
few occurrences of past tense verbs, third person pronouns, verbs in the
perfect aspect and public verbs (e. g. assert, complain, say).
78
aspect and public verbs (e.g. assert, complain, say).
Table 12. Linguistic features of Dimension 2 in the AMC

Table 12. Linguistic features of Dimension 2 in the AMC.
Dimension 2
The following example from the AMC illustrates the low frequency
of these linguistic items (in bold) and highlights (in italics) those which
The following example from the AMC illustrates the low frequency of
have a negative score on Dimension 2:
these linguistic items (in bold) and highlights (in italics) those which have a
negative score on Dimension 2:
Speaker1: Mrs. Larson? It uh it won’t be much longer, Mrs. Larson.
Speaker1: No no no. There will be no more pain for your husband He’s
heavily sedated.
Extract 10b from the OK
Speaker2: AMC:I think I’m gonna go, send little Hal in now.
Speaker1: No no no I don’t think that’s such a good idea. With all the
Speaker1: Mrs. Larson?
painkillers It reverend’s
uh the uh it won’t be much
not exactly himself. longer,
Speaker2: Look I think my boy has a right to say goodbye to his father
Mrs. Larson. I mean the man means everything in the world to him.
Although both face-to-face and movie conversation are not mainly
characterized by a high occurrence of past tense verbs, these tenses can
appear in conversation, especially when the talk takes a narrative or
reporting slant, as extracts 11a and 11b demonstrate (see bold). It is
worth noting that, even under these circumstances, the features which
contribute to an interactional and interpersonal dialogic character
(highlighted in Dimension 1, cf. Section 3.2) are still recognizable.
And so he just young, this popular young woman coming in the neigh-
borhood that was a young teacher and he was a popular guy. You know
everybody was crazy about him. After the first wife he said oh she did
79
him so bad he wouldn’t marry again. But then when he saw this young
girl, everybody was inviting her into their homes and to different par-
ties and then she was all modern coming from Houston, you know so
he snapped her up and married her. He asked my dad for her hand
very politely. And daddy gave him a long lecture.
Yeah, Nate said it was great. He actually… He applied here, but they
wanted someone with more experience.
These extracts show that the various Dimensions and Factors of Multi-
Dimensional Analysis are closely related to one another. The linguistic
items which are not frequent and have a positive weight on Dimension
2, for example, are compensated by the high frequency of other items
which play an important part in Dimension 1: as Table 12 and 13
illustrate, the low occurrence of past tense verbs (pasttnse) and of verbs
in the perfect aspect (perfects) is compensated by the high occurrences of
the present forms (pres). In much the same way, the low mean score of
public verbs (pub_vb) is compensated by a high mean score of private
verbs (prv_vb), as the low frequency of third person pronouns except it
(pro3) is compensated by the high frequency of first and second person
pronouns and possessives (pro1 and 2) and it pronouns (it).
Table 12. Comparisons between some of the linguistic features present in the LSAC
which compensate each other in Dimensions 1 and 2.
LINGUISTIC ITEMS AND THEIR MEAN SCORES

DIMENSION 1 DIMENSION 2
present forms 118.21 past tense verbs 38.47
perfect aspect 5.09
private verbs 29.49 public verbs 6.56
first person pronouns and possessives 65.80
second person pronouns and possessives 35.37 third person pronouns except it 31.94
it pronouns 24.60
80
Table 13. Comparisons between some of the linguistic features present in the AMC
which compensate each other in Dimensions 1 and 2.

present forms 117.23 past tense verbs 31.46
perfect aspect 8.80
private verbs 24.40 public verbs 5.20
first person pronouns and possessives 72.33
second person pronouns and possessives 53.36 third person pronouns except it 24.20
it pronouns 19.00
This compensation is an illustration of the fact that Multi-Dimen-

sional Analysis can predict the absence or presence of correlated lin-
guistic features (Biber 1988): texts which have many present forms, for
example, are not likely to have many past forms; moreover, such texts,
which are also expected to lack public verbs, are also likely to have many
private verbs, and vice-versa.
3.4 Explicit vs. Situation-Dependent Reference

(Dimension 3)
Face-to-face and movie conversation display extremely similar linguistic

variables also with regard to Dimension 3, namely, explicit (positive)
vs. situation-dependent (negative) reference (cf. Biber 1988). The con-
versational domains under investigation both have a negative mean
score (-7.04 and -5.72 respectively, see Chart 4), which implies that
they both rely on situation-dependent reference (cf. Biber 1988).
81
domains under investigation both have a negative mean score (-7.04 and -5.72
respectively, see Chart 4), which implies that they both rely on situation-
dependent reference (cf. Biber 1988).
Chart 4. Dimension 3: Explicit vs. Situation-dependent reference.

Chart 4. Dimension 3: Explicit vs. Situation-dependent reference
To enter into detail, in face-to-face conversation the highest occur-

rence of To enter into detail, in face-to-face conversation the highest occurrence
features on this Dimension regards place and time adverbs
(pl_adv and tm_adv)Dimension
of features on this and otherregards
adverbs place and time
which haveadverbs (pl_adv
a negative and
weight
tm_adv
on this ) andand
Factor other adverbs
mark situation-dependency 26
which have a negative .weight on this Factor
Conversely, and
the lowest
26
mark situation-dependency . Conversely, the lowest
mean scores regard items which have a positive weight on Dimension mean scores regard items
3. Table
which14 havedemonstrates
a positive weightthat there is a3.low
on Dimension frequency
Table of wh-pronouns
14 demonstrates that there
thatisfunction as a relative
a low frequency clause in
of wh-pronouns thatobject
functionposition (rel_obj
as a relative clausein in the ta-
object
bles), wh-pronouns
position (rel_obj in that function
the tables), as a relative
wh-pronouns clauseasina relative
that function subjectclause posi-
tionin(rel_subj), wh-pronouns
subject position (rel_subj), and wh-pronouns that function as a relativein
and that function as a relative clause
object position
clause in objectwith prepositional
position fronting
with prepositional (rel_pipe).
fronting (rel_pipe).These
These pronouns
pronouns
are all
are usually
all usuallyusedusedas as
devices
devicesfor forthe
the“explicit, elaboratedindication
“explicit, elaborated indication of
of referents in a text” (cf. Biber 1988: 110). Face-to-face conversation
also displays
26 a low
This means that they arepercentage
usually employedoffor phrasal connectors
references outside (p_and)
the text (Biber 1988:110).and
nominalization (n_nom), which indicate a type of referential and in-
formational discourse.
26 This means that they are usually employed for references outside the text (Biber
1988: 110).
82
which indicate a type of referential and informational discourse.
Table 14. Linguistic features of Dimension 3 in the LSAC

Dimension 3
In Extract 12a it is clear that understanding the conversation depends

highly
In on reference
Extract 12a it is to the that
clear context and, once again,
understanding it can be seendepends
the conversation that
the various Dimensions and Factors of Multi-Dimensional Analysis
highly on reference to the context and, once again, it can be seen that the
are interrelated: the situation-dependent reference is determined here not
various only
Dimensions andofFactors
by the use of Multi-Dimensional
the adverb tomorrow, which,analysis are interrelated:
as an adverb of time,
has a negative weight
the situation-dependent on Dimension
reference 3, here
is determined but also by the
not only by presence
the use ofofthe
items (e. g. pronouns) which have a positive weight on Dimension 1.
adverb tomorrow, which, as an adverb of time, has a negative weight on
Dimension 3, 12a
Extract but from
also the the presence of items (e.g. pronouns) which have a
byLSAC:
positive weight on Dimension 1.
Speaker1: Oh, she wants me to save them for tomorrow.
Speaker2: That was very good, very good. I especially like the ones
without sugar that you made for me.
Speaker1:
As Table Oh, shemovie
15 demonstrates, wantsconversation
me to save them for
also displays extremely
similartomorrow.
mean scores regarding the linguistic features associated with
Dimension 3Table
and 15.
found in face-to-face
Linguistic features ofconversation.
Dimension 3 in the AMC
Speaker2: That was very good, very good. I especially
likeLinguistic
Table 15. the ones without
features sugar that
of Dimension 3 in you made for me.
the AMC.
Dimension 3
As Table 1 demonstrates, movie conversation also displays extremely
similar mean scores regarding the linguistic features associated with Dimension
3 and found in face-to-face conversation.
83
This means that, analogously to face-to-face conversation, movie
conversation has a negative score on Dimension 3 and, thus, by adopting
This means that, analogously to face-to-face conversation, movie con-
versation has a negative score on Dimension 3 and, thus, by adopting
adverbs and pronouns, it relies on situation-dependent reference. These
features can be seen in the following extract, in which the adverbs and
pronouns are marked in bold:
Speaker1: I have a busy day today. Drinks then dinner. Don’t wait up
will you, darling?
Speaker2: I stopped waiting a long time ago, George
Speaker1: Oh and erm, that lunch tomorrow, cancel that too, will you?
Speaker2: Problems?
Speaker1: I doubt it. But Slever Key won’t stop calling. You know sci-
entists. They’re worse than models. You have to coddle them
all the time, like little children.
The interrelation of Dimensions 1 and 3 is presented in Tables 16 and

17: the relatively low mean score of the features which usually indicate
elaboration of explicit reference (see Dimension 3 in the tables) is com-
pensated for, both in movie and face-to-face conversation, by a high
frequency of elements which contribute to situation-dependent reference
(see Dimension 1 in the tables) and also to an involved and interactive
production (namely to Dimension 3 and Dimension 1 respectively).
Table 16. Comparisons between compensating linguistic features in Dimensions 1

and 3 (in the LSAC).

first person pronouns and possessives 65.80 Singular noun – nominalization 10.62
Wh pronoun – relative clause –
second person pronouns and possessives 35.37 subject position 0.90
Coordinating conjunction –
it pronouns 24.60 phrasal connector 0.59
demonstrative pronouns 13.14 object position 0.32
object position with prepositional
fronting (‘pied piping’) 0.07
84
Table 17. Comparisons between compensating linguistic features in Dimensions 1
and 3 (in the AMC).

first person pronouns and possessives 72.33 Singular noun – nominalization 13.03
second person pronouns and possessives 53.36 subject position 0.66
Coordinating conjunction –
it pronouns 19.00 phrasal connector 0.63
demonstrative pronouns 12.60 object position 0.46
object position with prepositional
fronting (‘pied piping’) 0.13
3.5 Overt Expression of Persuasion (Dimension 4)
Also in terms of overt expression of persuasion, namely Dimension 4,

which is characterized only by features with positive weight (cf. Biber 1988;
Biber, Conrad and Reppen 1998), face-to-face and movie conversation
have a very similar mean score: 0.60 and 0.64 respectively (cf. Chart 5).
Chart 5. Dimension 4: Overt expression of persuasion.
85
In linguistic terms, these low mean scores indicate that both the
conversational domains under investigation contain a low percentage of
elements that are typical of persuasion. As Tables 18 and 19 respectively
In linguistic terms, these low mean scores indicate that both the con-
versational domains under investigation contain a low percentage of
elements that are typical of persuasion. As Tables 18 and 19 respec-
tively display, and as all the extracts from the corpora above confirm,
both face-to-face and movie conversation have a low percentage of
infinitive verbs (inf in the table), modals of prediction (will, would, shall
– prd_mod), suasive verbs (e. g. ask, command, insist – sua_vb), subordi-
nating conjunctions – conditionals (e. g. if, unless – sub_cnd), modals of
necessity (e. g. ought, should, must – nec_mod), and adverbs within aux-
iliary (i. e. splitting aux-verb – spl_aux) which usually carry weight in
persuasive language (Biber 1988). Infinitive verbs, for example, can be
used as adjectives and verb complements in expressions like happy to
do it; they encode “the speaker’s attitude or stance towards the propo-
sition encoded in the infinitival clause” (Biber 1988: 111). Modals are
direct pronouncements that certain events will (prediction), should (ob-
ligation or necessity), can or might (possibility) occur. Suasive verbs,
for instance, imply intentions to make an event occur and conditional
subordination specifies the conditions required to do so. Split auxilia-
ries modals
are often (i. e. auxiliaries which occur
, which explains why with an features
these adverb which
have isweight
placed on
be- this
tween them and the main verb, like can often do) are often modals,
dimension (cf. Biber 1988:111).
which explains why these features have weight on this dimension (cf.
Biber 1988: 111).
Dimension 4
86 Table 19. Linguistic features of Dimension 4 in the AMC

Dimension 4
3.6 Abstract vs. Non-Abstract Information (Dimension 5)

3.6 Abstract vs. Non-Abstract Information (Dimension 5)
The only noticeable difference between face-to-face and movie conversation
thatThe
emerges from Multi-Dimensional
only noticeable analysis
difference between concerns
face-to-face abstract
and movie con-vs. non-
versation
abstract that emerges
information fromDimension
, namely Multi-Dimensional Analysis
5 (cf. Biber 1988):concerns ab-
movie dialogue has
stract vs. non-abstract information, namely Dimension 5 (cf. Biber
a positive mean score (1.66) and is consequently labeled as abstract, whereas
1988): movie dialogue has a positive mean score (1.66) and is conse-
face-to-face conversation
quently labeled as abstract, has a face-to-face
whereas negative mean scorehas(-2.04)
conversation a nega- and is
tive mean score
consequently labeled(-2.04) and is consequently
as non-abstract . Despite labeled as non-abstract.
this polar (positive-negative)
Despite this polar (positive-negative) difference, however, it can be said
difference, however, it can be said that the two conversational domains are still
that the two conversational domains are still rather similar, because
the span difference27 between them is relatively slight (3.70). Chart 6
clearly illustrates their closeness.
27 I. e. the maximum distance between positive and negative mean scores, here:
2.04+1.66=3.70.
87
rather similar, because the span difference27 between them is relatively slight
(3.70). Chart 6 clearly illustrates their closeness.
Chart 6. Dimension Abstract vs.

Chart 6.5:Dimension 5: Non-abstract information.information
Abstract vs. Non-abstract
In terms of linguistic features, as Tables 20 and 21 and the extracts

provided Inabove
terms show,
of linguistic
this features,
similarity as is
Tables 20 and
further 21 andbythethe
proven extracts
rather
lowprovided
percentage agentless
aboveof show, this passive
similarityverbs (agls_psv
is further in the
proven ratherpassive
tables),
by the low
verbspercentage
+ by (by_pasv),
of agentlessand passive
passive verbs (postnominal
agls_psv in the modifiers (whiz_vbn)
tables), passive verbs +
found
by (in both), conversational
by_pasv domains.
and passive postnominal These(whiz_vbn
modifiers forms )have founda inpositive
both
weight on this dimension
conversational domains. These for forms
they have
are used to reduce
a positive weight onemphasis on the
this dimension
entity doing the action of the verb (the agent), and to give
for they are used to reduce emphasis on the entity doing the action of the verb promi-
nence
(theto the and
agent), entity being
to give acted on
prominence to the(the patient),
entity which
being acted is generally
on (the patient),
an abstract referent (cf. Biber 1988: 112).
which is generally an abstract referent (cf. Biber 1988:112).
27
I.e. the maximum distance between positive and negative mean scores, here:
Dimension 5
2.04+1.66=3.70.
88
Dimension 5
If we examine the figures in Tables 20 and 21, it also emerges that

the main difference between face-to-face and movie conversation may
be Ifcaused
we examine the figures
by the slightly higherin presence
Tables 20inand 21, of
movies it also emerges
conjuncts (e. g.that the
mainhowever, therefore,
difference thus face-to-face
between – conjncts). These have positive
and movie weightmay
conversation on Di-
be caused
mension 5, which has no features bearing heavy negative weight (cf.
by the slightly higher presence in movies of conjuncts e.g.however, therefore,
Biber 1988). It is worth emphasizing though that both the corpora
– conjncts
thusanalyzed here). have
These have positive
a relatively weight
low mean onofDimension
score conjuncts and 5, which
subor- has no
dinating
features conjunctions
bearing heavyand, on the other
negative weighthand,cf. favor 91coordinating
bothBiber 88). It is worth
conjunctions, as the following extracts clearly show.
emphasizing though that both the corpora analyzed here have a relatively low
Extract
mean score13aof
from the LSAC: and subordinating conjunctions and, on the other
conjuncts
hand, both favor coordinating
Speaker1: conjunctions,
And know it’s asit’sthe
recording. And following
showing you theextracts
time in clearly
seconds how much you are, in minutes and hours how much
show. you are recording. And it also adds a program number for,
when you stop and start again it adds a number two, that
kind of thing. You don’t need to worry about that.
ExtractSpeaker2:
13a fromOkay.
the LSAC:
Speaker1: But, uh, but, uh, one tricky thing is how you start record-
Speaker1: And know it's recording. And it's showing
ing. I mean Jack made a mistake he thought he was record-
you the timeand
ing in he wasn’t once
seconds howwhen
much we,you
whenare,
we were testing this
in minutes
and he was supposed to know how to use it already so …
and hoursandhow
these much
buttonsyou are ofrecording.
are kind hard to pushAnd
also. it also
Speaker1: Just relax. And I want you to imagine that you’re on a beach.
Speaker2: OK.
Speaker1: It’s a warm day and the sun is just starting to set. And you’re
looking in the eyes of a woman, and you’re feeling her heart.
You’re seeing her soul. You’re feeling her spirit. That’s it. That’s
it. Excellent. Excellent.
89
3.7 Summary
In this chapter I have compared movie conversation to face-to-face

conversation by means of Multi-Dimensional Analysis. What has
emerged is a striking similarity between them which strongly contrasts
the recurrent view in the literature that movie language is artificial and,
thus, not likely to represent spoken language. More specifically, the
data have revealed that both the conversational domains have a posi-
tive score as far as Dimension 1 and 4 are concerned, and a negative
score with regard to Dimension 2 and 3. On Dimension 1, namely,
Informational versus Involved Production, they both have a rather high
percentage of uninflected presents, imperatives, verbs in the third person,
private verbs, first and second person pronouns and possessives, which all
contribute to a high affective, interactional, and generalized content.
The total mean scores of the two corpora have also been proven to be
almost identical: 35.31 (movie conversation) vs. 35.04 (face-to-face
conversation).
In terms of Dimension 2, namely Narrative versus Non-narrative
Concerns, both face-to-face conversation and movie dialogues are char-
acterized by non-narrative concerns. This means that they are marked
by immediate time and attributive nominal elaboration. Indeed, a rela-
tively low percentage of past tense verbs, third person pronouns, verbs in
the perfect aspect, and public verbs have been found in both corpora.
Also in this case, the total mean scores of the two conversational do-
mains have been found to be extremely similar: -0.97 (movie conver-
sation) vs. -0.84 (face-to-face conversation).
With regard to Dimension 3, namely Explicit versus Situation-De-
pendent Reference, both the conversation domains rely on situation-
dependent reference. For instance, they both have a low percentage
(i. e. below 1%) of wh-pronouns functioning as relative clauses in ob-
ject position, as relative clauses in subject position, and as relative
clauses in object position with prepositional fronting. The total mean
scores of face-to-face and movie conversation are quite close, also on
this Dimension (-7.04 and -5.72 respectively).
90
With respect to Dimension 4, namely Overt Expression of Persua-
sion, the data have proven that, even though the two conversational
domains have positive scores, they do not have a high percentage of
infinitive verbs, modals of prediction, suasive verbs, subordinating con-
junctions, modals of necessity, and adverbs within the auxiliary, which
have weight on this factor. Their mean scores are rather low and again
extremely similar (face-to-face conversation: 0.60 vs. movie conversa-
tion: 0.64).
The only difference that has emerged from Multi-Dimensional
Analysis regards Dimension 5, namely Abstract versus Non-abstract In-
formation: movie conversation has a positive score (1.66) and has con-
sequently been defined as abstract, whereas face-to-face conversation
has a negative score (-2.04) and has been labeled as non-abstract. De-
spite this difference in polarity, however, it has been pointed out that
neither of the two conversational domains has a high score on fea-
tures which contribute to abstract information, which means that the
difference between them, which corresponds to 3.70, is fairly mini-
mal: a low percentage of agentless passive verbs, passive verbs + by, and
passive postnominal modifiers characterize both face-to-face and movie
conversation. As a consequence, Dimension 5 should not be consid-
ered as particularly important in differentiating between the two con-
versational domains. The main difference has been ascribed to the rela-
tively higher presence of adverbial conjuncts in the movies compared
to face-to-face conversation, namely 6.83 and 1.37 respectively, which
have positive weights on this factor.
Table 22 and Chart 7 sum up the similarities which have emerged
from Multi-Dimensional Analysis: Table 22 recapitulates the main fea-
tures characterizing the five dimensions investigated, whereas Chart 7
outlines the closeness of the mean scores of face-to-face and movie con-
versation.
91
Table 22. Summary of the features that emerged from Multi-Dimensional Analysis.
Table 22. Summary of the features that emerged from Multi-Dimensional analysis
Table 22. Summary of the features that emerged from Multi-Dimensional analysis
DIMENSION 1: (+) Involved Production (AMC & LSAC)

Both the AMC and LSAC have a positive score and are characterized by linguistic features
which contribute to affective, fragmented, interactional, and generalized context, e. g.:
– verbs (uninflected presents, imperatives and third persons)
– second person pronouns and possessives
– first person pronouns and possessives
– private verbs
– it pronouns
particle
– discourse particles
DIMENSION 2: (-) Non-narrative Concerns (AMC & LSAC)

Both the AMC and LSAC have a negative score and are characterized by linguistic features
which contribute to immediate time and attributive nominal elaboration, e. g.:
– present tense
– attributive adjectives
DIMENSION 3: (-) Situation-Dependent Reference (AMC & LSAC)

Both the AMC and LSAC have a negative score and are characterized by linguistic features
which are usually employed for references outside the text, e. g.:
– place adverbs
– time adverbs
DIMENSION 4: (+) (Low) Overt Expression of Persuasion (AMC & LSAC)

Both the AMC and LSAC have a (low) positive score and are characterized by linguistic
features which usually carry weight in persuasive language, e. g.:
– infinitive verbs
– modals of prediction
– suasive verbs
– subordinating conjunctions – conditionals
– modals of necessity
– adverbs within auxiliary (splitting aux-verb)
DIMENSION 5: (+) Abstract Information (AMC)
vs. (-) Non-abstract (LSAC) Information

The AMC has a positive score and contains a higher presence of linguistic features which
characterize abstract information (e. g. conjuncts), whereas the LSAC has a negative score.
Chart 7. Summary of the main charts that emerged from Multi-Dimensional analysis
92
Chart 7. Summary of the main charts that emerged from Multi-Dimensional analysis
Chart 7. Summary of the main charts that emerged from Multi-Dimensional Analysis.
93
The two conversational corpora examined are extremely similar also in
rms of the Dimension which characterizes them most: they display the
ghest and most significant mean score in Dimension 1 (face-to-face
The two conversational corpora examined are extremely similar also
in terms of the Dimension which characterizes them most: they dis-
play the highest and most significant mean score in Dimension 1 (face-
to-face conversation: 35.04 vs. movie conversation: 35.31). Quanti-
tatively, the same can be said for the mean scores characterizing the
other Dimensions (Dimension 5 excluded). As modeled in Chart 8,
it can thus be concluded that both face-to-face and movie conversa-
tion belong to the same text type, which is primarily interpersonal,
is primarily
affective, interpersonal, affective,
interactional, interactional,and
and generalized and generalized
secondly, and secondly,
non-narrative,
situation dependent,
non-narrative, and not highly
situation dependent, and notpersuasive.
highly persuasive.
Chart 8. Multi-Dimensional
Chart 8.Analysis results. analysis results
Multi-Dimensional
94
Chapter 4
Shot 2: Close-ups
4.1 Introduction
Having established the similarity of face-to-face and movie conversation

through a macro-analysis in Chapter 3, Chapter 4 narrows the scope and
provides close-ups which examine this similarity in detail. Firstly, a
comparative Multi-Dimensional Analysis of two movie genres – com-
edies and non-comedies – is presented in Section 4.2. Secondly, in
Section 4.3, phraseological comparisons are made between the whole
AMC, including all genres of movies, and the LSAC. The features exam-
ined are word lists, multi-word sequences and pattern types.
4.2 Multi-Dimensional Analysis of Movie Genre
In this section, Multi-Dimensional Analysis is used to investigate

whether the genre of a movie influences the resemblance between face-
to-face and movie conversation. As Chapter 3 demonstrated, the two
conversational domains are in fact very similar, despite what is usually
maintained in the literature (cf. Chapter 1). They share four out of
five Dimensions: they both have a positive score as far as Dimension 1
and 4 are concerned, and a negative score as regards Dimension 2 and
3. Face-to-face and movie conversation, consequently, belong to the
same text type in that they are involved, non-narrative, situation de-
pendent, and not highly persuasive. The only difference found concerns
Dimension 5: movie conversation has a positive score, so it is defined
95
as abstract, whereas face-to-face conversation has a negative score and
is labeled as non-abstract. In spite of this difference, however, neither
of the two conversational types has a high mean score, which means
that the difference between them can be considered minimal.
The present section illustrates, through Multi-Dimensional Analy-
sis, that movie genre – specifically, comedy and non-comedy28 – does
not significantly influence the resemblance of movie conversation to
face-to-face conversation. The results of the Multi-Dimensional Analy-
sis illustrated in Table 23 reflect what has already emerged in the ana-
lysis described in Chapter 3, which did not take movie genre into
account: even though comedies are slightly more similar than non-
comedies to face-to-face conversation, both movie genres have four
Dimensions out of five in common with the latter.
Table 23. Comparative Multi-Dimensional Analysis of LSAC versus AMC com-

edies and AMC non-comedies.
Variable LSAC Comedies Non-comedies

Dimension 1 35.04 35.86 36.68
Dimension 2 -0.84 -1.11 -1.15
Dimension 3 -7.04 -5.68 -5.93
Dimension 4 0.60 0.24 1.59
Dimension 5 -2.04 2.20 0.87
In detail, comedy conversation is more similar to face-to-face conversation

with regard to Dimensions 1, 2 and 429. Non-comedy conversation,
instead, is closer to face-to-face conversation as regards Dimensions 3
and 530. This slightly closer similarity between comedies and face-to-
face conversation becomes more evident by taking into account only the
four Dimensions shared with face-to-face conversation (i. e. Dimen-
28 The comparison between face-to-face conversation and borderline genre mov-
ies (cf. Section 2.4.1) is provided in Appendix 5.
29 Namely, those related to affective, interactional, and generalized contexts, to non-
narrative concerns, and to a not particularly high level of persuasion respectively.
30 Namely, those related to situation-dependent factors and to non-abstract infor-
mation.
96
sions 1, 2, 3, and 4, cf. bold in Table 23). Indeed, by excluding Dimen-
sion 5, which is the Dimension on which face-to-face and movie con-
versation mostly differ, and which neither comedies nor non-comedies
share with face-to-face conversation in terms of polarity, comedies are
closer to face-to-face conversation with regard to three Dimensions (i. e.
Dimensions 1, 2, and 4), whereas non-comedies are closer to it only with
respect to Dimension 3. The Dimension on which non-comedies differ
most from face-to-face conversation is Dimension 4. The following sec-
tions will offer specific details on the results concerning Dimensions 1–5.
4.2.1 Informational vs. Involved Production (Dimension 1)
Non-comedies have a slightly higher occurrence of those features which

have a positive weight on Factor 1: as Table 24 illustrates, they dis-
play a higher number of that-deletions (that_del in the tables); verbs,
in particular uninflected presents, imperatives and third persons (pres) and
be (be_state); it pronouns (it); causative subordinating conjunctions (e. g.
because – sub_cos); wh-questions (wh_ques); and modals of possibility (i. e.
can, may, might, could – pos_mod). Conversely, they have a lower oc-
currence of nouns and attributive adjectives (respectively n and adj_attr)
which have a negative weight on Factor 1. As a consequence, the re-
spective weights of these linguistic items make Factor 1 of non-com-
edies slightly higher (36.68), and more involved, than that of com-
edies and face-to-face conversation (35.86 and 35.04 respectively).
Table 24. Linguistic features of Dimension 1 in AMC comedies and non-comedies.
Dimension 1 (+)
fname that_del contrac pres pro2 pro_do p-dem gen-emph pro1 it
comedies.txt 8.3 7.5 114.7 54.6 4.2 13.0 11.6 75.2 19.0
noncomedies.txt 8.7 5.0 120.4 52.5 4.1 11.7 8.0 72.3 20.5
fname be_state sub_cos prtcle pany gen_hdg amplifr wh-ques pos_mod o_and wh_cl finlprep
comedies.txt 2.7 1.2 8.7 7.1 1.9 2.7 5.6 7.1 11.3 2.4 4.4
noncomedies.txt 3.7 1.6 7.7 52.5 1.7 2.3 6.4 10.5 11.3 2.6 3.2
fname n prep adj_attr typetokn wrdlngth
comedies.txt 194.3 60.5 18.1 48.3 3.8

noncomedies.txt 189.3 63.0 15.1 56.5 3.9
97
4.2.2 Narrative vs. Non-Narrative Concerns (Dimension 2)
The difference related to Dimension 2 indicates that non-comedies

are slightly more non-narrative (-1.15) than comedies and face-to-face
conversation (-1.11 and -0.84 respectively). This may depend on the
following reasons: (a) the linguistic items which have a positive weight
in Dimension 2 (such as past tenses) are less frequent in non-comedies
and /or (b) those which have a negative weight on this factor (such as
present tenses) are more frequent. However, Table 25 illustrates that past
tense and perfect aspect (pasttnse and perfects respectively), which would
make the text more narrative, are more frequent in non-comedies.
Therefore, the difference between comedies and non-comedies is to
be ascribed to the higher frequency of present tenses and it pronouns.
These items happen to be some of the features which have positive
weight on Dimension 1: as Table 24 shows, non-comedies have a
higher frequency of present tense and it pronouns, which not only have
weight on Dimension 1 and make it more involved, but also weigh
on Dimension 2, making it more non-narrative.
Dimension 2 (-)
fname pasttnse pro3 perfects pub_vb

comedies.txt 7.1 11.3 2.4 4.4
noncomedies.txt 10.5 11.3 2.6 3.2
4.2.3 Explicit vs. Situation-Dependent Reference (Dimension 3)
As regards Dimension 3, non-comedies are more situation-dependent

than comedies (-5.93 vs. -5.68 respectively); this makes the former
more similar to face-to-face conversation (-7.04). As Table 26 dem-
onstrates, this similarity depends on the fact that non-comedies have
fewer occurrences of the linguistic items which have a positive weight
on Dimension 3, and more occurrences which have a negative weight
98
on it. More specifically, non-comedies have two linguistic features
(i. e. relative clauses in subject position and phrasal connectors, i. e. rel_subj
and p_and ) which carry a positive weight on Factor 3, whereas com-
edies have three: relative clause in object position, wh-pronouns that func-
tion as a relative clause in object position with prepositional fronting,
and nominalization, i. e. rel_obj, rel_pipe, and n_nom respectively. Fur-
thermore, non-comedies have two linguistic items which carry a nega-
tive weight (see labels in bold in Table 26) on Dimension 3 (i. e. time
and place adverbs, i. e. tm_adv and pl_adv), whereas comedies have only
one (adverbs, i. e. advs).
Table 26. Linguistic features of Dimension 3 of AMC comedies and non-comedies.
Dimension 3 (-)
fname rel_obj rel_subj rel_pipe p_and n_nom tm-adv pl_adv ads
comedies.txt 0.3 0.6 0.1 0.8 10.2 6.4 12.4 47.9

noncomedies.txt 0.5 0.4 0.2 0.7 16.3 7.7 14.4 44.4
4.2.4 Overt Expression of Persuasion (Dimension 4)
Dimension 4 is the Factor which displays a major difference within the

present comparison: face-to-face conversation 0.60, comedies 0.24, and
non-comedies 1.59. This implies that face-to-face conversation and
comedies are less characterized by persuasion than non-comedies. As
illustrated in Table 27, this clearly depends on the slightly higher fre-
quency of the elements which have a positive weight on Dimension 4:
infinitive verbs, modals of prediction, subordinating conjunctions – condi-
tional, and modals of necessity (inf, prd_mod, sub_cnd, and nec_mod) are
more frequent in non-comedies; apart from this, suasive verbs and ad-
verbs within auxiliary (sua_vb and spl_aux) are marginally higher in
comedies.
99
Dimension 4 (+)
fname inf prd_mod sua_vb sub_cnd nec_mod spl-aux
comedies.txt 10.9 7.4 0.8 3.0 5.4 2.3

noncomedies.txt 13.6 8.5 0.9 4.9 5.6 2.4
4.2.5 Abstract vs. Non-Abstract Information (Dimension 5)
As pointed out in Chapter 3, Dimension 5 is the only Factor which

displays a marked difference between face-to-face and movie conversa-
tion: movie conversation has a positive mean score (1.66), whereas face-
to-face conversation has a negative one (-2.04). However, this is not
extremely significant because, in spite of the polar difference, the span
difference between the two conversational domains is very slight
(2.04+1.66=3.70). In terms of the present comparison, comedies appear
to be less similar to face-to-face conversation (2.20 vs. -2.04 resepctively)
than non-comedies (0.87). It is worth noting that the dissimilarity which
emerged above between face-to-face and movie conversation is mainly
caused by the slightly higher presence in movie conversation of conjuncts
(conjncts), which, as Table 28 shows, are found especially in comedies.
Dimension 5 (-)
fname conjncts agls_psv by_pasv whiz_vbn sub_othr
comedies.txt 8.0 3.9 0.1 0.5 5.0

noncomedies.txt 5.3 5.4 0.3 0.4 5.2
100
4.2.6 Summary
The present Multi-Dimensional Analysis clearly demonstrates that

movie genre does not affect the similarity between face-to-face and
movie conversation: both comedies and non-comedies share four
Dimensions out of five with face-to-face conversation, Dimension 5
having a different (negative) polarity. It has emerged that comedies
are slightly more similar to spoken language than non-comedies, their
scores being closer on Dimensions 1, 2 and 4. Non-comedies, on the
other hand, are only closer to face-to-face conversation on Dimension
3. The principal difference between face-to-face and movie conversa-
tion derives mostly from the presence of conjuncts, which occur slightly
more in comedies. It may be concluded, however, that the quantitative
difference found in the two sub-corpora is so minimal that it does not
trigger any qualitative difference: both comedies and non-comedies
belong to the same text type of face-to-face conversation. This is il-
lustrated in Chart 9.
Chart 9. Multi-Dimensional analysis of comedies, non-comedies, and face-to-
Chart 9. Multi-Dimensional Analysis of comedies, non-comedies, and face-to-face
face conversation
conversation.
4.3 Phraseological comparisons

101
In this section, a corpus-driven approach (Francis 1993, Tognini-Bonelli 2001,
Biber 2009) is adopted to see whether the macroscopic similarity between face-
to-face and movie conversation which emerged in Chapter 3 is also present at a
microscopic level. In particular, the present section focuses on word lists, lexical
4.3 Phraseological Comparisons
In this section, a corpus-driven approach (Francis 1993, Tognini-

Bonelli 2001, Biber 2009) is adopted to see whether the macroscopic
similarity between face-to-face and movie conversation which emerged
in Chapter 3 is also present at a microscopic level. In particular, the
present section focuses on word lists, lexical bundles, and multi-word
sequences and pattern types. The reason behind this quantitative and
qualitative investigation is the fundamental role played by phraseol-
ogy and recurrent semi-fixed phrases which comparable corpus-driven
research cannot ignore (Sinclair 1991, Hunston 2006, Biber 2009,
Forchini and Murphy 2010, cf. also Firth 1935a, 1935b, 1951a,
1951b, 1957a, 1957b and Halliday 2003b31).
4.3.1 Word Lists
The quantitative analysis of the word lists of face-to-face and movie

conversation (cf. Table 29) shows that both corpora have highly simi-
lar most frequent words: the first nine words in the LSAC (i. e. I, the,
you, and, to, it, that, a, of ) correspond to the first nine words in the
AMC; moreover, there are other words within the first 30 occurrences
which are present in both corpora (i. e. in, is, know, have, we, oh, this,
it’s, what, just, do, on). This means that the two conversational domains
share almost two-thirds of their thirty most frequent words.
31 The centrality to linguistic analysis of the mutual relations between the differ-
ent levels of language and of meaning as function in context arise from a tradi-
tion based on the pioneering work of John Rupert Firth and then developed by
the so-called new-Firthians – i. e. Michael Halliday and John Sinclair – and,
later, by contemporary scholars such as Biber, Francis and Hunston, Stubbs,
Tognini-Bonelli, inter alia.
102
32
Table
Table 29. Word lists 29.the
32 of Word lists
LSAC andof
thethe LSAC and the AMC
AMC.
LSAC AMC
N Word Per 1,000 W N Word Per 1,000 W
1 I 34.50 1 YOU 40.19
2 THE 29.76 2 I 32.07
3 YOU 27.59 3 THE 29.37
4 AND 26.59 4 TO 21.43
5 TO 23.50 5 A 21.12
6 IT 18.99 6 AND 14.34
7 THAT 18.20 7 THAT 13.27
8 A 18.14 8 IT 12.88
9 OF 12.52 9 OF 12.24
10 YEAH 10.90 10 IS 10.16
11 IN 10.69 11 IN 10.02
12 IS 10.33 12 ME 09.69
13 KNOW 09.68 13 WHAT 09.62
14 WAS 09.16 14 THIS 08.75
15 HAVE 08.77 15 NO 08.19
16 LIKE 08.58 16 I’M 07.77
17 SO 08.47 17 ON 07.71
18 WE 07.88 18 YOUR 07.15
19 OH 07.69 19 OH 07.11
20 THEY 07.59 20 HAVE 06.83
21 THIS 07.57 21 DO 06.77
22 IT’S 07.54 22 FOR 06.76
23 WHAT 07.27 23 DON'T 06.74
24 JUST 07.20 24 MY 06.70
25 DO 07.00 25 KNOW 06.69
26 BUT 06.94 26 WE 06.34
27 WELL 06.52 27 JUST 06.06
28 UH 06.44 28 IT’S 05.66
29 ON 06.35 29 NOT 05.59
30 HE 06.29 30 ALL 05.58
32 The numbers in the table are normalized to 1,000.
103
32
The numbers in the table are normalized to 1,000.
From a qualitative point of view, the analysis of the most frequent words
confirms the results of the Multi-Dimensional Analysis presented in the
previous chapter: both corpora are characterized by interpersonal com-
munication given that both of them display a frequent use of first person
pronouns and possessives (I, we), second person pronouns and possessives (you),
private verbs (know), of it pronouns (it), demonstrative pronouns (that,
this), coordinating conjunctions (and ), verb do, and wh-expressions (what).
4.3.2 Lexical Bundles: Two- and Four-grams
Other similarities between face-to-face and movie conversation emerge

when the two-grams present in the two corpora are compared: in Ta-
ble 30 it can be seen that the three most frequent two-grams (you know,
I don’t and in the) are identical. Moreover, 20 out of the 30 most fre-
quent two-grams in movies are also found within the 30 most frequent
two-grams of face-to-face conversation; and the other two-grams (e. g.
come on, all right, no no, thank you) are present in the LSAC, even
though they do not occur among its 30 most frequent two-grams.
It is known that texts with similar co-occurring linguistic features
also share at least one communicative function (Biber 1988: 63–64),
and, significantly, the two-grams present in both the corpora under
investigation (i. e. I don’t, are you, do you, come on, all right, I have,
thank you, etc.) are those which reflect the interpersonal function typi-
cal of conversation (cf. Biber et al. 1999) and highlight the commu-
nicative exchange between speakers (Halliday 1993). This finding
strengthens the claim made here that movie conversation is close to
face-to-face conversation not only on the basis of the linguistic struc-
tures they have in common, but also in terms of the pragmatic func-
tions these structures display.
104
Table 30. Two-grams present in the LSAC and in the AMC (indicated in bold)33.
Face-to-face Conversation Movie Conversation

1 YOU KNOW 5.32 1 YOU KNOW 2.83
2 I DON’T 3.61 2 I DON’T 2.51
3 IN THE 2.66 3 IN THE 2.44
4 AND I 2.41 4 ARE YOU 2.18
5 I THINK 2.34 5 DO YOU 2.07
6 I MEAN 2.28 6 COME ON 2.01
7 HAVE TO 2.07 7 THIS IS 1.82
8 IT WAS 2.03 8 OF THE 1.81
9 OF THE 2.02 9 ALL RIGHT 1.55
10 AND THEN 2.00 10 HAVE TO 1.39
11 I WAS 1.98 11 ON THE 1.36
12 GOING TO 1.94 12 I WAS 1.35
13 DON’T KNOW 1.87 13 I HAVE 1.34
14 DO YOU 1.83 14 NO NO 1.31
15 WANT TO 1.57 15 A LITTLE 1.23
16 TO BE 1.54 16 I KNOW 1.22
17 ON THE 1.54 17 THANK YOU 1.22
18 THIS IS 1.47 18 AND I 1.20
19 TO DO 1.45 19 HAVE A 1.14
20 I KNOW 1.43 20 IF YOU 1.14
21 UH HUH 1.36 21 I MEAN 1.12
22 IF YOU 1.31 22 OUT OF 1.12
23 KIND OF 1.31 23 DON’T KNOW 1.11
24 I HAVE 1.29 24 TO DO 1.08
25 YOU HAVE 1.19 25 I THINK 1.06
26 YOU CAN 1.19 26 TO BE 1.04
27 TO THE 1.18 27 I JUST 1.03
28 BUT I 1.16 28 I’M SORRY 1.02
29 HAVE A 1.13 29 TO THE 1.02
30 ARE YOU 1.11 30 YOU HAVE 1.00

This claim is further supported by the fact that the most frequent lexical
bundle in the two corpora is you know, even though the occurrences105
in the
33
This claim is further supported by the fact that the most frequent lexi-
cal bundle in the two corpora is you know, even though the occur-
rences in the LSAC are twice as frequent (5.32) as those present in
the AMC (2.83). This is thought-provoking, since you know is very
frequent in conversation (Kennedy 1998, Biber et al. 1999), as illus-
trated in Chapter 1, and is usually considered part of the core spoken
language (McCarthy 1999, Erman 2001); consequently, its high fre-
quency in both corpora makes movie conversation similar to face-to-
face conversation also along this parameter. Furthermore, despite this
numerical difference, empirical data have proven (cf. Forchini 2009,
2010) that the patterning of you know, in terms of its general distri-
bution and function, is extremely similar in the two conversational
domains. As a matter of fact, you know occurs homogeneously and
especially in mid-position in the turn; it occurs less in initial position
and rarely in final position; and it is usually employed when the speaker
recounts or comments on something, often providing new informa-
tion or information that may be unknown to the listener. Finally, it
is also interesting to note that another characteristic shared by face-
to-face and movie conversation is the frequent co-occurrence of you
know with other discourse markers, interjections, and inserts. In par-
ticular, in face-to-face conversation, you know principally occurs with
I mean in the clusters I mean you know and you know I mean (cf. Ta-
ble 31). The high frequency of you know suggests that this two-gram
probably belongs to a larger cluster, which may correspond to the pat-
tern discourse marker/insert + you know or you know + discourse marker/
insert.
106
Table 31. Clusters of you know in the LSAC34.
Table 31. Clusters of you know in the LSAC34
N Cluster Per 1,000 W
1 YOU KNOW WHAT I 0.11
2 YOU KNOW I MEAN 0.06
3 WHAT I MEAN 0.05
4 YOU KNOW YOU KNOW 0.04
5 DO YOU KNOW WHAT 0.04
6 I MEAN YOU KNOW 0.04
7 YOU KNOW I DON’T 0.04
8 I DON'T KNOW 0.03
9 YOU KNOW AND I 0.03
10 YOU KNOW I THINK 0.03
11 YOU KNOW AND THEN 0.03
12 YOU KNOW I WAS 0.02
13 YOU KNOW IF YOU 0.02
14 YOU KNOW IT’S LIKE 0.02
15 WELL YOU KNOW I 0.02
16 YOU KNOW IT WAS 0.02
17 YOU KNOW WHAT I’M 0.02
18 UH HUH YOU KNOW 0.02
19 YOU KNOW UH HUH 0.02
20 WELL YOU KNOW WHAT 0.02
21 BUT YOU KNOW WHAT 0.02
22 I SAID YOU KNOW 0.02
23 DO YOU KNOW WHERE 0.01
24 AND I SAID 0.01
25 YOU KNOW WHAT YOU 0.01
26 AND YOU KNOW I 0.01
27 AND YOU KNOW WHAT 0.01
28 YOU KNOW I JUST 0.01
29 YOU KNOW I KNOW 0.01
30 BUT YOU KNOW I 0.01
A similar pattern emerges from the movie dialogue corpus: even though
there are only 0.30 occurrences (per thousand words) of the cluster you know I
mean in the AMC, its frequent co-occurrence with other discourse markers,
107 (R1)
interjections and inserts is further confirmed by the left (L1) and right
collocates of you know in both corpora. These collocates are usually
expressions like uh, um, oh, well, like, yeah, just. In the corpus of face-to-face
A similar pattern emerges from the movie dialogue corpus: even though
there are only 0.30 occurrences (per thousand words) of the cluster
you know I mean in the AMC, its frequent co-occurrence with other
discourse markers, interjections and inserts is further confirmed by the
left (L1) and right (R1) collocates of you know in both corpora. These
collocatesthere
conversation, are usually
are more expressions like uh,than
R1 collocates oh, well, like,
in the just, as
AMC, but,illustrated
mean in
(cf. Tables 32 and 33). In the corpus of face-to-face conversation, there
Tablesare32more
and R133. collocates
This may simply
than inbe
thedue to a This
AMC. difference in the be
may simply corpora
due tosize or
a difference
to a difference in the corpora
in genre, meaningsize thatorinto
movies you know
a difference in genre, meaningonly to
might belong
the discourse marker/insert + you know and not to the you knowmarker/
that in movies you know might belong only to the discourse + discourse
insert + you know and not to the you know + discourse marker/insert
marker/insert
cluster, ascluster,
it doesas
in itface-to-face
does in face-to-face conversation.
conversation.
Table
Table 32. 32.
TheThe 10 most
10 most frequentL1L1and
frequent andR1 discourse marker/insert
R1discourse marker/insert collocates of you
collocates of you
know in the LSAC35. 35
know in the LSAC
Word L1 R1
WELL 1.70
BUT 1.50 0.50
LIKE 1.10 1.00
UH 0.90 0.40
SO 0.60 0.50
YEAH 0.50 0.40
UM 0.50
JUST 0.40
MEAN 0.40 0.40
OH 0.40
Table 33. L1 and R1 discourse marker/insert collocates of you know in the AMC36
108
Similarly, as Table 34 illustrates, face-to-face and movie conversation

Table 33. L1 and R1 discourse marker /insert collocates of you know in the AMC36.
Table 33. L1 and R1 discourse marker/insert collocates of you know in the AMC36
Word L1 R1
BUT 0.50 0.10
HEY 0.30
MEAN 0.30
WELL 0.20
UH 0.10
JUST 0.09 0.30
LIKE 0.09
OH 0.09
Similarly,
Similarly,asasTable
Table34 34
illustrates, face-to-face
illustrates, and movie
face-to-face conversation
and movie conversation
have similar features also in terms of four-grams: although they do
have similar features also in terms of four-grams: although they do not share
not share many of them, there are some (such as I want you to and
manyyou of them,
want methere
to are
/do some
you want four-grams as I why
suchknow
me; do you wantand
youI to you want
andknow
don’t
why; I thought you were and I thought it was) which are clearly inter-
35 changeable and functionally similar. Together with other bundles which
36
The have personal
numbers pronouns
in the table (e. g. you
are normalized know and I don’t), for example, these
to 1,000.
four-grams highlight the interpersonal function which is typical of
spoken language.
109
Table 34. LSAC and AMC four-grams37 (similar and/or functionally similar four-
Table 34. LSAC and AMC four-grams37 (similar and/or functionally similar four-
grams in bold).
grams in bold)
LSAC AMC
1 # WHITE YES # 0.30 1 I WANT YOU TO 0.23
2 I DON’T KNOW IF 0.19 2 WHAT ARE YOU DOING 0.23
3 I DON’T KNOW WHAT 0.18 3 I DON’T KNOW WHAT 0.20
4 COLLEGE # WHITE YES 0.18 4 NO NO NO NO 0.16
5 DO YOU WANT TO 0.16 5 WHAT DO YOU MEAN 0.15
6 I DON’T WANT TO 0.13 6 ARE YOU TALKING ABOUT 0.14
7 YOU DON’T HAVE TO 0.11 7 BOMB BOMB BOMB BOMB 0.13
8 BA BS COLLEGE # 0.11 8 YOU WANT ME TO 0.13
9 YOU KNOW WHAT I 0.10 9 WHAT DO YOU THINK 0.12
10 YOU WANT ME TO 0.10 10 COME ON COME ON 0.11
11 I DON’T KNOW I 0.10 11 WHAT ARE YOU TALKING 0.11
12 I WAS GOING TO 0.10 12 NICE TO MEET YOU 0.10
13 I DON’T KNOW HOW 13 ARE YOU DOING HERE
14 SOME COLLEGE # WHITE 14 WHAT THE HELL IS
15 IF YOU WANT TO 15 HOW DO YOU KNOW
16 BS COLLEGE # WHITE 16 THE HELL ARE YOU
17 I DON’T THINK SO 17 TO TALK TO YOU
18 OR SOMETHING LIKE THAT 18 YOU DON’T HAVE TO
19 ARE YOU GOING TO 19 A LONG TIME AGO
20 # # TO # 20 DO YOU KNOW WHY
21 WELL I DON’T KNOW 21 I DON’T HAVE A
22 AND I WAS LIKE 22 I DON’T WANT TO
23 I DON’T KNOW WHY 23 I JUST WANTED TO
24 MA MS # WHITE 24 I KNOW I KNOW
25 MS # WHITE YES 25 I THOUGHT YOU WERE
26 DO YOU WANT ME 26 IF I TOLD YOU
27 WHAT ARE YOU DOING 27 LET ME ASK YOU
28 BUT I DON’T KNOW 28 SO WHAT ARE YOU
29 #### 29 THE REST OF THE
30 I THOUGHT IT WAS 30 WHAT DO YOU WANT
Another similarity
Another emerges
similarity from
emerges thethe
from factfactthat most
that mostofofthe
thefour-grams
four-grams
shared by both corpora are verb phrase fragments, such as I don’t know know
shared by both corpora are verb phrase fragments, such as I don’t what,
what, you want me to, etc. (cf. Table 34) – which are, of course, typi-
you want me to, etc. (cf. Table 34) – which are, of course, typical of spoken
cal of spoken language (cf. Biber 2009).
language (cf. Biber 2009).

37
110
4.3.3 Multi-Word Sequences and Pattern Types
Further insights into the similarities between face-to-face and movie

conversation are provided by findings regarding multi-word formu-
laic sequences. As Biber (2009: 289) puts it, “the most frequent word
sequences (lexical bundles) usually incorporate both function words
and lexical words”. It has been proven that the typical multi-word pat-
terns of conversation are considerably different from those typifying
written language: the former “tend to be fixed sequences (including
both function words and content words)”, whereas most patterns in
academic writing, for example, “are formulaic frames consisting of in-
variable function words with an intervening variable slot that is filled
by content words” (Biber 2009: 275). An investigation into this fea-
ture in movie conversation has brought to light that, as for spontane-
ous conversation, most four-word sequences are composed of both con-
tent words and function words, which are equally likely to be fixed
(i. e. 53.84 % and 46.15 % respectively).
Another feature the two conversational domains share is the fact
that most four-word sequences that include a content word are com-
posed of the content word in a sequence with three function words
(e. g. I don’t know if ). The frequency of this tendency in movie lan-
guage is illustrated in Table 35: 83 % of content words are composed
of the content word in sequence with three function words (i. e. what
are you doing, I want you to, I don’t know what, what do you mean, are
you talking about, you want me to).
Table 35. Multi-word formulaic sequences composed of content words in the AMC.
composed of 1 content word in sequence

83 %
with 3 function words
composed of 2 content words in sequence
17 %
with 2 function words
Furthermore, a closer investigation of content words shows that in spo-

ken language the verbs “know, think (thought), and what occur as con-
tent words in almost 50 % of the conversational multi-word sequences
111
Furthermore, a closer investigation of content words shows that in
spoken language the verbs “know, think (thought), and what occur as content
words in almost 50% of the conversational multi-word sequences that include a
that
content include
word a content
of any word of
kind” (Biber any kind”In(Biber
2009:300). much2009: 300).way,
the same In much
the most
the same way, the most frequent verbs in the multi-word sequences
frequent verbs in the multi-word sequences found in the AMC are know, want,
found in the AMC are know, want, talk (not mentioned by Biber),
talk (not
and mentioned by Biber),
think (thought). These and think and
patterns, (thought
their ).raw
These patterns,are
occurrences, and
il- their
lustrated in Table
raw occurrences, 36.
are illustrated in Table 36.
Table 36. Most frequent verbs in multi-word sequences in the AMC

Table 36. Most frequent verbs in multi-word sequences in the AMC
VERBS RAW OCCURRENCES
KNOW 78
I DON’T KNOW WHAT 22
HOW DO YOU KNOW 8
YOU KNOW WHAT I'M 7
I KNOW I KNOW 7
DO YOU KNOW WHY 7
WANT YOU TO KNOW 6
I MEAN YOU KNOW 6
BUT I DON’T KNOW 5
I DON’T EVEN KNOW 5
I DON'T KNOW IF 5
WANT 66
I WANT YOU TO 25
YOU WANT ME TO 14
WHAT DO YOU WANT 7
I DON’T WANT TO 7
I JUST WANTED TO 7
WANT YOU TO KNOW 6
TALK 53
ARE YOU TALKING ABOUT 15
WHAT ARE YOU TALKING 12
TO TALK TO YOU 8
I NEED TO TALK 6
NEED TO TALK TO 6
TALK TO YOU ABOUT 6
THINK / THOUGHT 37
WHAT DO YOU THINK 13
I THOUGHT YOU WERE 7
I DON’T THINK I 6
I THINK I’M GONNA 6
DO YOU THINK YOU 5
Finally, as Table 37 shows, even though both conversation and academic

writing use the full range of possible pattern types, over 50% of the multi-word
112
Finally, as Table 37 shows, even though both conversation and aca-
sequences in conversation
demic writing represent
use the full continuous
range of possible sequences of fixed
pattern types, overelements,
50 %
of the
with multi-wordor sequences
a preceding following in conversation
variable represent
slot (e.g. continuous
*234, 123*, 12**, se-
etc.).
quences writing,
Academic of fixed instead,
elements, with an
prefers a preceding or following
internal variable variable
slot (e.g. 12*4, slot
1*34,
(e. g. *234, 123*, 12**, etc.). Academic writing, instead, prefers an in-
etc.).
ternal variable slot (e. g. 12*4, 1*34, etc.).
Table
Table37.37.
Distribution of common
Distribution multi-word
of common sequences across
multi-word pattern
sequences types,pattern
across showingtypes,
the % of
all sequencesshowing
in conversation
the %vs.ofacademic writing in
all sequences (Biber 2009:294)
conversation vs. academic writing
(Biber 2009: 294).
Similarly to the spoken domain, movie language displays the 123*,

12**,Similarly to the
*23*, **34, spoken
and *234domain, movie
patterns, language
which includedisplays the 123*,
a preceding 12**,
or fol-
lowing
*23*, variable
**34, slot.patterns,
and *234 As Table 38 include
which demonstrates, movie
a preceding conversation
or following variable
prefers the external variable slot type (75.38 %) to the internal
slot. As Table 38 demonstrates, movie conversation prefers the external one
(24.61 %), particularly the 123* variant, which is also very frequent
variable slot type (75.38%) to the internal one (24.61%), particularly the 123*
in face-to-face conversation.
variant, which is also very frequent in face-to-face conversation.
Table 38. Most Frequent Patterns with External and Internal Variable Slot (*) in the AMC
PATTERNS WITH EXTERNAL * PATTERNS WITH INTERNAL *

113
Table 38. Most Frequent Patterns with External and Internal Variable Slot (*) in
the AMC.
PATTERNS WITH EXTERNAL * PATTERNS WITH INTERNAL *

1 2 3 * PATTERN 1 * * 4 PATTERN
TYPES 7 TYPES 4
TOKENS 155 TOKENS (TOT) 89
WHAT DO YOU * 42 I * * TO 39
WHAT ARE YOU * 37 YOU * * TO 22
I DON’T KNOW * 27 I * * KNOW 18
WHAT THE HELL * 16 I * * IF 10
THE REST OF * 12
I NEED TO * 11
GET OUT OF * 10
1 2 * * PATTERN 1 2 * 4 PATTERN
TYPES 4 TYPES 1
TOKENS (TOT) 79 TOKENS (TOT) 11
I DON’T * * 25 THANK YOU * MUCH 11
ARE YOU * * 30
DO YOU * * 12
I MEAN * * 12
* 2 3 * PATTERN 1 * 3 4 PATTERN
TYPES 2 TYPES 1
TOKENS (TOT) 70 TOKENS (TOT) 13
* DO YOU * 50 I**A 13
* DON’T HAVE * 20
* * 3 4 PATTERN
TYPES 2
TOKENS (TOT) 29
* * ARE YOU 15
* * YOU KNOW 14
* 2 3 4 PATTERN
TYPES 1
TOKENS (TOT) 13
* DON’T HAVE TO 13
TOTAL TOTAL
TYPES 16 TYPES 6
TOKENS 346 TOKENS 113
75.38% 24.61%
4.3.4 Summary
114
4.3.4 Summary
Taken as a whole, the results of the investigation into phraseology have

demonstrated that if language is placed on a continuum, where spo-
ken language is located at one end and written language at the other,
movie conversation is situated near spoken language. The following
micro-features, which are found in both the AMC and the LSAC, lend
support to this statement:
– the most frequent words are highly similar;
– the three most frequent two-grams (you know, I don’t and in the)
are identical;
– the patterning of you know, especially in terms of its general distri-
bution and function, is extremely similar;
– 20 out of the 30 most frequent two-grams in movies are also found
within the 30 most frequent two-grams of face-to-face conversa-
tion;
– the other two-grams (e. g. come on, all right, no no, thank you) are
present in the LSAC, even though they do not occur among its 30
most frequent two-grams;
– the three most frequent four-grams of movie conversation (I want
you to, what are you doing, I don’t know what) are present in face-
to-face conversation;
– even though there are not many four-grams in common, there are
some which are functionally similar (e. g. I want you to and you want
me to /do you want me; do you know why and I don’t know why;
I thought you were and I thought it was) and the majority of them
are verb phrase fragments;
– both movie and face-to-face conversation have content words and
function words in multi-word sequences which are equally likely
to be fixed;
– most of their four-word sequences are composed of: one content
word + three function words;
– most of their frequent verbs in content word sequences are know,
think (thought), and what;
115
– both movie and face-to-face conversation display continuous se-
quences of fixed elements, with a preceding or following variable
slot (e. g. 123*).
It may thus be concluded that, since the word lists, two-grams, four-
grams, and formulaic patterns that are typical of spoken language have
also been found in movie conversation, the two conversational domains
are significantly similar. On a functional level, this finding reflects the
interpersonal trait which emerged from the Multi-Dimensional Analy-
sis in Chapter 3.
116
Chapter 5
Closing Credits: Implications and Applications
5.1 Authentic Movie Language
In the introduction to this study I pointed out that I wanted to examine

the linguistic features characterizing American face-to-face and movie
conversation, two domains which are usually claimed to differ espe-
cially in terms of spontaneity. It has already been recalled in Sections
1.1, 1.3, and 3.1 that natural conversation is considered the quint-
essence of the spoken language (Sinclair 2004b) as it is totally spon-
taneous, whereas movie conversation is usually described as non-
spontaneous, being artificially written-to-be-spoken (Nencioni 1976,
Gregory and Carroll 1978, Taylor 1999, Rossi 2003, Pavesi 2005) and,
thus, not likely to represent the general usage of conversation (Sinclair
2004b). My objective was to investigate authentic movie data (and not
webscripts) to compare these two different linguistic provinces of dis-
course, as Sinclair (2004b: 80) advocates:
In summary I am advocating that we should trust the text. We should be open

to what it may tell us. We should not impose our ideas on it, except perhaps
just to get started. Until we see what the preliminary results are, we should
apply only frameworks that are loose and flexible, in order to accommodate
the new information that will come from the text. We should expect to
encounter unusual phenomena; we should accept that a large part of our
linguistic behaviour is subliminal, and that therefore we may find a lot of
surprises. We should search for models that are especially appropriate to the
study of texts and discourse.
The study of language is moving into an era in which the exploitation of
modern computers will be at the centre of progress. The machines can be har-
nessed in order to test our hypotheses, they can show us things that we may not
already know and even things which shake our faith quite a bit in established
117
models, and which may cause us to revise our ideas very substantially. In all of
this my plea is to trust the text (Sinclair 2004a: 23).
The reasons for adopting this empirical approach can be summed up as

follows (see also Section 2.2): first of all, authentic data allow for de-
scriptions of naturally occurring combinations of words as opposed to
the limiting and sometimes deviating traits of intuition (Biber, Conrad,
and Reppen 1998, Partington 1998, Ulrych 1999, Stubbs 2001, Hoff-
mann 2004, Sinclair 2006, Johansson 2007). Second, authentic data
offer both quantitative and qualitative analyses (McEnery and Wilson
1996, Sinclair 1996, Biber, Conrad, and Reppen 1998, Kennedy 1998,
Aarts 2001, Halliday 2003a). Third, being empirical they lead toward
descriptivism as opposed to prescriptivism (Francis 1993, Tognini-
Bonelli 2001, Halliday 2003c, Johansson 2007, Renouf 2007). Fourth,
they can be collected in large databases in electronic format and can,
consequently, be easily and quickly processed, replicated, and shared
(Sinclair 1991, Stubbs 2001, Hoffmann 2004, Sinclair 2004a, Wynne
2004, Börjas 2006, Mahlberg 2006). Last, but not least, they are useful
sources not only for research, but also for teaching (Hunston 2002,
Reppen 2010).
Nowadays, indeed, as Sinclair (2004c) reminds us, many teachers
consider both large and small corpora as useful tools. In particular,
professionals in the fields of English as a Second Language and as a
Foreign Language (ESL/EFL) “have adopted a preference for ‘authentic’
materials, presenting language from natural texts rather than made-up
examples” (Reppen 2010: 4). One of the main advantages of using
corpora in the classroom is that they “provide a ready source of natural,
or authentic, texts for language learning” (Reppen 2010: 4). Further-
more, they allow the learner to become an empowered and autono-
mous traveller (Bernardini 2004).
118
5.1.1 A Reflection of Spoken Language
Following the principles and the methodology illustrated in Chapter

1, after overviewing the general features characterizing face-to-face and
movie conversation in Chapter 2 and 3 respectively, Chapters 4 and 5
aimed to answer the following research questions:
1) To what extent do face-to-face and movie conversation differ or
resemble each other?
2) Does movie genre influence this difference or resemblance?
3) Are the features of the core spoken language equally frequent in
movie language?
4) If so, what are their pragmatic functions in both face-to-face con-
versation and movie language?
Regarding question number one, both the macro and the micro-analy-
ses have empirically shown that face-to-face and movie conversation
do not differ to a great extent. In terms of Biber’s (1988) Dimensions,
it has been demonstrated that both the conversational domains under
investigation are informal, non-narrative, situation-dependent and not
highly persuasive. Consequently, since these are all factors linked to
the spontaneous nature of conversation (cf. Biber 1988 and Chapter 1),
it can be concluded that movie language also has a significant amount
of spontaneity.
The Multi-Dimensional Analysis of comedies and non-comedies,
research question number two, has shown that movie genre does not
influence the similarity between face-to-face and movie conversation,
even though comedies are slightly more similar to spoken language.
The main difference between the two genres, which is minimal, has
been ascribed to the presence of conjuncts, which occur slightly more in
comedies than in non-comedies.
As for research question number three, the investigation has proven
that the similarity between face-to-face and movie conversation which
emerged at a macroscopic level is also present at a more microscopic
level. In particular, the two conversational domains have been shown
to share the following features:
119
– the most frequent words, two-grams, and four-grams;
– certain two-grams and four-grams which highlight the interpersonal
function brought to light by Multi-Dimensional Analysis;
– content words and function words in multi-word sequences which
are equally likely to be fixed;
– most of the four-word sequences composed of one content word
+ three function words;
– most of the frequent verbs in content word sequences;
– continuous sequences of fixed elements, with a preceding or follow-
ing variable slot.
On the whole, the functional interpretation of these specific features
reflects the results that emerged within the more general Multi-
Dimensional Analysis and especially highlights the importance of
Dimension 1: the most frequent words, two-grams and four-grams in
both face-to-face and movie conversation offer further proof of the
interpersonal function which characterizes these domains. Besides, their
common preference for functional words and the use of clauses, for
example, not only reflects the results of Multi-Dimensional Analysis,
which has shown that that-clauses, wh-clauses, causative adverbial
clauses, and conditional adverbial clauses (i. e. finite dependent clauses) are
characteristic of interpersonal spoken registers, but also demonstrates
that the grammar of movie and face-to-face conversation does not
differ.
5.1.2 A Source for Spoken Language Teaching
Mauranen (2004: 89) emphasizes the role of corpora in discovering

formulaic expressions and, in particular, of spoken corpora, which
“offer direct access to characteristics of speech, so often inadequately
described in textbooks”. As she aptly points out, “in spoken language,
learners get less help from standard pedagogic descriptions than they
do for writing, and therefore often need to work out the use of linguis-
tic features for themselves. This is clearly a point where corpora can
120
help” (Mauranen 2004: 103). This is promising as regards the future
utility of the AMC or of other similar corpora. Such corpora are an
effective resource for teaching features of spoken language, as demon-
strated by experiments conducted with Italian university students study-
ing English (Forchini forthcoming). The students in question success-
fully acquired features of spoken discourse, such as elisions, blends, repetitions,
false starts, reformulations, discourse markers, and interjections, and were
also highly motivated by the use of movies.
In addition to the advantages of using movies with students, the
relative simplicity of accessing movie DVDs and transcribing movie
speech, compared to the complications of collecting spoken data (cf.
also Biber et al. 1999: 1041, McCarthy 1999: 13, Halliday 2005: 162),
should not be ignored. One disadvantage of spoken corpora, indeed, is
that they are laborious to compile (Halliday 2003c, Mauranen 2004).
First, representative speakers who agree to be recorded have to be found;
second, they have to be recorded in such a way that their recording can
then be easily accessed and heard; finally, these recordings need to be
transcribed in order to be searchable with corpus linguistic software.
Using movie language to study spoken features also offers a possible
solution to the problem of having authentic teaching material.
5.2 Concluding Remarks and Future Directions
The present study has pointed out the linguistic resemblance of movie
conversation to face-to-face conversation and confutes the claim that
movie language has “a very limited value” and that is “not likely to
be representative of the general usage of conversation” (cf. Sinclair
2004b: 80).
It cannot be ignored that the motion-picture medium imposes cer-
tain non-spontaneous traits on movie language, limiting it in terms of
total spontaneity, but in this study I have shown that the current view
of movie dialogue as being non-representative of the general usage of
121
conversation needs to be re-considered. The surprising results of this
empirical study are in direct contrast with what has been proclaimed in
the literature for thirty years and require a revision of both the notion
that movie language has limited value and the label artificial, which
scholars have often given to this type of conversational domain.
The main implication of the striking similarity between the com-
municative functions and linguistic features characterizing face-to-face
and movie conversation is that, from a theoretical point of view, it
legitimates the use of movie language to teach features of spoken lan-
guage. On the practical side, given the relative ease of collecting movie
conversation material compared to collecting authentic spoken mate-
rial, the use of movie language as a source of teaching spoken features
becomes appealing. Lastly, there is the universal attraction that movies
hold for viewers, which should not be underestimated.
In terms of future directions, it could be interesting to investigate
movies further to see whether other genre categories produce different
results. The study could also be broadened out to include and compare
other varieties of English. The advantages of exploring real movie data
and the considerable potential that movie language offers for the study
of the spoken language, however, have been amply exemplified.
122
Appendices
Appendices
Appendix 1. Linguistic features codes (Multi-Dimensional analysis)
Appendix 1.
Linguistic Features Codes (Multi-Dimensional Analysis)
COUNT Codes Linguistic Features

{Positive Dimension 1}
1= prv_vb Private Verbs (e. g. believe, feel, think)
2= that_del ‘That’ Deletion
3= contrac Contraction
4= pres Verb (uninflected present, imperative & third person)
5= pro2 Second person pronoun / possessive
6= pro_do Verb ‘Do’
7= pdem Demonstrative Pronoun
8= gen_emph Adverb / Qualifier – Emphatic (e. g. just, really, so)
9= pro1 First person pronoun / possessive
10 = it Pronoun ‘it’
11 = be_state Verb ‘Be’ (uninflected present tense, verb and auxiliary)
12 = sub_cos Subordinating Conjunction – Causative (e. g. because)
13 = prtcle Discourse Particle (e. g. now)
14 = pany Nominal Pronoun (e. g. someone, everything)
15 = gen_hdg Adverbial - Hedge (e. g. almost, maybe)
16 = amplifr Adverb / Qualifier – Amplifier (e. g. absolutely, entirely)
17 = wh_ques Wh- question
18 = pos_mod Modals of possibility (can, may, might, could)
19 = o_and Coordinating conjunction – clausal connector
20 = wh_cl Wh- Clause
21 = finlprep Stranded Preposition
123
{Negative Dimension 1}
22 = n Noun
23 = prep Preposition
24 = adj_attr Attributive Adjective
{Dimension 2}
25 = pasttnse Past Tense Verb
26 = pro3 Third person pronoun (except ‘it’)
27 = perfects Verb – Perfect Aspect
28 = pub_vb Public Verbs (e. g. assert, complain, say)
{Dimension 3}
29 = rel_obj Wh pronoun – relative clause – object position
30 = rel_subj Wh pronoun – relative clause – subject position
Wh pronoun – relative clause – object position with prepositional
31 = rel_pipe
fronting (‘pied piping’)
32 = p_and Coordinating conjunction – phrasal connector
33 = n_nom Singular noun –nominalization
34 = tm_adv Adverb – Time
35 = pl_adv Adverb – Place
36 = advs Adverb (not including counts 8, 15, 16, 34, 35, 49)
{Dimension 4}
37 = inf Infinitive Verb
38 = prd_mod Modal of prediction (will, would, shall)
39 = sua_vb Suasive Verb (e. g. ask, command, insist)
40 = sub_cnd Subordinating conjunction – conditional (e. g. if, unless)
41 = nec_mod Modal of necessity (ought, should, must)
42 = spl_aux Adverb within auxiliary (splitting aux-verb)
{Dimension 5}
43 = conjncts Adverbial – conjuncts (e. g. however, therefore, thus)
44 = agls_psv Agentless passive verb
45 = by_pasv Passive verb + by
46 = whiz_vbn Passive postnominal modifier
47 = sub_othr Subordinating conjunction – Other (e. g. as, except, until)
124
Appendix 2. Face-to-Face Conversation Means Procedure
Appendix 2. Face-to-face conversation means procedure (Multi-Dimensional analysis)38
(Multi-Dimensional Analysis)38
38 Means per 1,000 words.
38
Means per 1,000 words. 125
126
Appendix 3. Movie Conversation Means Procedure 39
Appendix 3. Movie conversation means procedure (Multi-Dimensional analysis)
(Multi-Dimensional Analysis)39

39
Means per 1,000 words.
127
128
Appendix 4. Movie Conversation Feature Counts
(Multi-Dimensional Analysis)
Appendix 4. Movie conversation
40 40
feature counts (Multi-Dimensional analysis)
129
40
Means per 1,000 words.
130
Appendix 5.
Multi-Dimensional Analysis of Borderline Genre Movies
Ø Dimension 1: INVOLVED PRODUCTION, but less involved than

comedies and non-comedies
• Lower occurrences of first person pronouns and possessives, pronoun ‘it’,
discourse particles, adverbial hedges, qualifier – amplifier adverbs, and
wh-questions;
• Higher occurrence of prepositions.
Ø Dimension 2: NON-NARRATIVE CONCERNS, but more narrative

than comedies and non-comedies
• Higher occurrences of past tenses and public verbs.
Ø Dimension 3: SITUATION-DEPENDENT REFERENCE, but less

dependent than comedies and non-comedies
• Higher occurrence of wh-clauses;
• Lower occurrence of time adverbs.
Ø Dimension 4: OVERT EXPRESSION OF PERSUASION, but char-

acterized by less overt expression of persuasion than comedies and non-
comedies
• Lower occurrence of modals of necessity and adverbs within auxiliary.
Ø Dimension 5: ABSTRACT INFORMATION with a score which is half-

way between comedies and non-comedies
Ø COMPARISON WITH FACE-TO-FACE CONVERSATION: Dimen-

sions 2 and 4 display more similarity and Dimension 3 displays greater
difference
131
132
References
Aarts, Bas. 2001. Corpus linguistics, Chomsky and fuzzy tree fragments. In Corpus
linguistics and linguistic theory, ed. Christian Mair and Marianne Hundt.
Amsterdam / Atlanta: Rodopi, 5–13.
Aijmer, Karin and Anna-Brita Stenström. 2005. Approaches to spoken interaction.
Journal of Pragmatics 37: 1743–1751.
Atkinson, Dwight. 2001. Scientific discourse across history: A combined multi-
dimensional / rhetorical analysis of the philosophical transactions of the Royal
Society of London. In Variation in English: Multi-dimensional studies, ed. Susan
Conrad and Douglas Biber. London: Longman, 45–65.
Austin, John Langshaw. 1962. How to do things with words. Oxford: Clarendon Press.
Baccolini, Raffaella and Rosa Maria Bollettieri Bosinelli, eds. 1994. Il doppiaggio:
trasposizioni linguistiche e culturali. Bologna: CLUEB.
Bazzanella, Carla. 1990. Phatic connectives as intonational cues in contemporary
spoken Italian. Journal of Pragmatics 14(4): 629–647.
Bazzanella, Carla. 1999. Forme di ripetizione e processi di comprensione nella con-
versazione. In La conversazione. Un’introduzione allo studio della conversazione
verbale, ed. Renata Galattolo and Gabriele Pallotti. Milano: Raffello Cortina
Editore, 205–225.
Bercelli, Fabrizio. 1999. Analisi conversazionale e analisi dei frame. In La conversa-
zione. Un’introduzione allo studio della conversazione verbale, ed. Renata Galat-
tolo and Gabriele Pallotti. Milano: Raffello Cortina Editore, 89–118.
Bettetini, Gianfranco. 2002 (4th edition). La conversazione audiovisiva. Problemi
dell’enunciazione filmica e televisiva. Milano: Studi Bompiani.
Bernardini, Silvia. 2004. Corpora in the classroom: An overview and some re-
flections on future developments. In How to use corpora in language teaching,
ed. John McHardy Sinclair. Amsterdam / Philadelphia: John Benjamins Long-
man, 15–36.
Biber, Douglas. 1985. Investigating macroscopic textual variation through multi-
feature / multi-dimensional analyses. Linguistics 23: 337–60.
Biber, Douglas. 1988. Variation across speech and writing. Cambridge: Cambridge
University Press.
Biber, Douglas. 1993. Representativeness in corpus design. Literary and Linguistics
Computing 8(4): 243–57.
Biber, Douglas. 1995. Dimensions of register variation: A cross-linguistic comparison.
Cambridge: Cambridge University Press.
133
Biber, Douglas. 2004. Conversation text types: A multi-dimensional analysis. 7es
Journées internationales d’Analyse statistique des Données Textuelles JADT’04,
<http://www.cavi.univ-paris3.fr/lexicometrica/jadt/jadt2004/pdf/JADT_000.
pdf>. Last accessed on 27/07/2011.
Biber, Douglas. 2006. University language: A corpus-based study of spoken and writ-
ten registers. Amsterdam / Philadelphia: John Benjamins.
Biber, Douglas. 2009. A corpus-driven approach to formulaic language in English.
Multi-word patterns in speech and writing. International Journal of Corpus Lin-
guistics 14(3): 275–311.
Biber, Douglas and Edward Finegan. 1986. An initial typology of English texts. In
New studies in the analysis and exploitation of computer corpora, ed. Jan Aarts
and Eijs Willem. Amsterdam: Rodopi, 19– 45.
Biber, Douglas and Edward Finegan. 2001a. Diachronic relations among speech-
based and written registers. In Variation in English: Multi-dimensional studies,
ed. Susan Conrad and Douglas Biber, 66–83. London: Longman.
Biber, Douglas and Edward Finegan. 2001b. Intra-textual variation within medi-
cal research articles. In Variation in English: Multi-dimensional studies, ed. Susan
Biber, Douglas, Susan Conrad, and Randi Reppen. 1998. Corpus linguistics: inves-
tigating language structure and use. Cambridge: Cambridge University Press.
Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad, and Edward Finegan.
1999. Longman grammar of spoken and written English. London: Longman.
Bollettieri Bosinelli, Rosa Maria (ed). 1998. La traduzione multimediale: quale tradu-
zione per quale testo? Atti del convegno internazionale: La traduzione multi-
mediale. Bologna: CLUEB.
Börjas, Kersti. 2006. Description and theory. In The handbook of English linguis-
tics, ed. Bas Aarts and April McMahon. Malden: Blackwell, 9–32.
Brown, Penelope and Stephen Levinson. 1987. Politeness: Some universals in lan-
guage usage. Cambridge: Cambridge University Press.
Bruti, Silvia. 2006. Cross-cultural pragmatics: the translation of implicit compli-
ments in subtitles. JoSTrans, Issue 06, <http://www.jostrans.org/issue06/art_
bruti.php>. Last accessed on 27/07/2011.
Bruti, Silvia and Elisa Perego. 2005. Translating the expressive function in subtitles:
the case of vocatives. In Research on translation for subtitling in Spain and Italy,
ed. John D. Sanderson. Alicante: Publicaciones de la Universidad de Alicante,
27– 48.
Bubel, Claudia. 2008. Film audience as overhearers. Journal of Pragmatics 40: 55–71.
Cameron, Deborah. 2001. Working with spoken discourse. London: Sage Publica-
tions Ltd.
Carter, Ronald and Michael McCarthy. 2006. Cambridge grammar of English: A
comprehensive guide. Spoken and written English: Grammar and usage. Cam-
bridge: Cambridge University Press.
134
Cattrysse, Patrick. 2001. Multimedia & translation: Methodological considerations.
In (Multi)Media translation. Concepts, practices and research, ed. Henrik Gottlieb
and Yves Gambier, 1–12. Amsterdam / Philadelphia: John Benjamins.
Chafe, Wallace. 1982. Integration and involvement in speaking, writing, and oral
Literature. In Spoken and written Language: Exploring orality and literacy,
ed. Deborah Tannen. Norwood / New Jersey: Ablex Publishing Corporation,
35–53.
Chaume, Frederic. 2004a. Cine y traucción. Catedra: Signo e Imagen.
Chaume, Frederic. 2004b. Discourse markers in audiovisual translating. Meta XLIX(4):
843–855, <http://www.erudit.org/revue/meta/2004/v49/n4/009785ar.pdf>.
Last accessed on 27/07/2011.
Chaume, Frederic. 2004c. Film studies and translation studies: Two disciplines at
stake in audiovisual translation. Meta XLIX(1): 12–24, <http://www.erudit.
org/revue/meta/2004/v49/n1/009016ar.pdf>. Last accessed on 27/07/2011.
Chomsky, Noam. 1957. Syntactic structures. Berlin: Mouton de Gruyter.
Chomsky, Noam. 1965. Aspects of the theory of syntax. Cambridge: The MIT Press.
Clark, Herbert H. and Edward F. Schaefer. 1992. Dealing with overhearers. In arenas
of language use, ed. Clark Herbert. Chicago: University of Chicago Press, 248–
273.
Conrad, Susan. 2001. Variation among disciplinary texts: a comparison of texts
about American nuclear arms policy. In Variation in English: Multi-dimensional
studies, ed. Susan Conrad and Douglas Biber, 84–93. London: Longman.
Contento, Silvana. 1999. Attività Bimodale: aspetti verbali e gestuali della comunica-
zione. In La conversazione. Un’introduzione allo studio della conversazione
verbale, ed. Renata Galattolo and Gabriele Pallotti. Milano: Raffello Cortina
Editore, 267–286.
Erman, Britt. 1987. Pragmatic Expressions in English: A study of you know, you see,
and I mean in face-to-face conversation. Stockholm: Almqvist & Wiksell Inter-
national.
Firth, John Rupert. 1935a. The technique of semantics. Transactions of the Philologi-
cal Society 36–72.
Firth, John Rupert. 1935b. The use and distribution of certain English sounds. Eng-
lish Studies xvii(I): 8–18.
Firth, John Rupert. 1951a. General linguistics and descriptive grammar. Transactions
of the Philological Society 216–228.
Firth, John Rupert. 1951b. Modes of meaning. In Essays and Studies, ed. John
Rupert Firth. English Association, 118–149.
Firth, John Rupert. 1957a. A synopsis of linguistic theory, 1930–1955. In Studies
in Linguistic Analysis, ed. John Rupert Firth et al. Special volume of the Philo-
logical Society. Oxford: Blackwell, 1–32.
Firth, John Rupert. 1957b. Papers in linguistics 1934–1951. London: Oxford Uni-
versity Press.
135
Forchini, Pierfranca. 2009. Spontaneity reloaded: American face-to-face and movie
conversation compared. Corpus Linguistics 2009 [Liverpool, Luglio 21–23 2009]
online proceedings <http://ucrel.lancs.ac.uk/publications/cl2009/>. Last ac-
cessed on 27/07/2011.
Forchini, Pierfranca. 2010. Well, uh no. I mean, you know. Discourse markers in
movie conversation. In Perspectives on Audiovisual Translation, ed. Lukasz
Bogucki. Bern: Peter Lang, 45–59.
Forchini, Pierfranca. Forthcoming. Movie conversation: a reflection of face-to-face
conversation and a source for teaching spoken language. In Papers from the
XXIV AIA Conference Proceedings, eds. Gabriella Di Martino, Linda Lombardo,
Silvia Nuccorini, Edizioni Q: Roma.
Forchini, Pierfranca and Murphy, Amanda. 2010. 4-grams in comparable special-
ized corpora: perspectives on phraseology, translation, and pedagogy. In Pat-
terns, meaningful units and specialized discourses, eds. Ute Römer, Rainer
Schulze. Amsterdam / Philadelphia: Benjamins Current Topics, 87–103.
Ford, Cecilia A. and Barbara A. Fox, Sandra A. Thompson (ed.). 2002. The lan-
guage of turn and sequence. Oxford: Oxford University Press.
Francis, Gill. 1993. A corpus-driven approach to grammar: principles, methods and
examples. In Text and technology. In honour of John Sinclair, ed. Mona Baker,
Gill Francis and Elena Tognini-Bonelli. Amsterdam / Philadelphia: John Benja-
mins, 137–156.
Gavioli, Laura. 1999. Alcuni meccanismi di base dell’analisi della conversazione.
In La conversazione. Un’introduzione allo studio della conversazione verbale, ed.
Renata Galattolo and Gabriele Pallotti. Milano: Raffello Cortina Editore, 43–
66.
Goffman, Erving. 1976. Replies and responses. Language in Society 5: 257–313.
Goffman, Erving. 1979. Footing. Semiotica 25: 1–29.
Gottlieb, Henrik and Yves Gambier, eds. 2001. Multi-media translation: concepts,
practices, and research. Amsterdam / Philadelphia: John Benjamins.
Gregory, Michael and Suzanne Carroll. 1978. Language and situation: Language va-
rieties and their social contexts. London: Routledge & Kegan Paul.
Grice, Paul Herbert. 1975. Logic and Conversation. In Speech acts (Syntax and Se-
mantics, Vol. 3), ed. Peter Cole and Jerry L. Morgan. New York: Academic
Press, 41–58.
Halliday, Michael Alexander Kirkwood. 1985a. An introduction to functional gram-
mar. London: Arnold.
Halliday, Michael Alexander Kirkwood. 1985b. Spoken and written language. Ox-
ford: Oxford University Press.
Halliday, Michael Alexander Kirkwood. 1987. Spoken and Written Modes of Mean-
ing. In Comprehending Oral and written language, ed. Rosalind Horowitz and
Jay Samuels. Orlando: Academic Press, 55–82.
136
Halliday, Michael Alexander Kirkwood. 1993. Quantitative studies and probabili-
ties in grammar. In Data, description, discourse. Papers on the English language
in honour of John Sinclair, ed. Michael Hoey. London: HarperCollins, 1–25.
Halliday, Michael Alexander Kirkwood. 2003a. Introduction: On the “architecture”
of human language. In On language and linguistics, ed. Jonathan J. Webster.
London / New York: Continuum, 1–32.
Halliday, Michael Alexander Kirkwood. 2003b (first printed in 1985). Systemic
background. In On language and linguistics, ed. Jonathan J. Webster. London /
New York: Continuum, 185–198.
Halliday, Michael Alexander Kirkwood. 2003c (first printed in 1992). Systemic
grammar and the concept of a “Science of Language”. In On language and
linguistics, ed. Jonathan J. Webster. London / New York: Continuum, 199–
212.
Halliday, Michael Alexander Kirkwood. 2005 (first printed in 2002). The spoken
language corpus: a foundation for grammatical theory. In Computational and
quantitative studies, ed. Jonathan J. Webster. London / New York: Continuum,
157–190.
Helt, Marie E. 2001. A multi-dimensional comparison of British and American
spoken English. In Variation in English: Multi-dimensional studies, ed. Susan
Higgins, John. 1991. Looking for patterns. In Classroom concordancing, ed. Tim
Johns and Philip King. Birmingham: Birmingham University Press, 4: 63–70.
Hoffmann, Sebastian. 2004. Are low-frequency complex prepositions grammatical-
ized? On the limits of corpus-data – and the importance of intuition. In Corpus
approaches to grammaticalization in English, ed. Hans Lindquist and Christian
Mair. Amsterdam / Philadelphia: John Benjamins, 171–210.
Hunston, Susan. 2002. Corpora in applied linguistics. Cambridge: Cambridge Uni-
versity Press.
Hunston, Susan. 2006. Phraseology and system: A contribution to the debate. In
System and Corpus: Exploring Connections, ed. Susan Hunston and Geoff
Thompson. London: Equinox Publishing, 55–80.
Johansson, Stig. 1993. Some aspects of the recommendations of the Text Encod-
ing Initiative, with special reference to the encoding of language corpora. In
Corpora Across Centuries, ed. Merja Kytö, Susan Wright, and Matti Rissanen.
Amsterdam / Atlanta: Rodopi, 203–210.
Johansson, Stig. 2007. Seeing through multilingual corpora. In Corpus linguistics
25 years on, ed. Roberta Facchinetti. Amsterdam / New York: Rodopi, 51–72.
Kennedy, Graeme. 1998. An introduction to corpus linguistics. London / New York:
Longman.
Mahlberg, Michaela. 2006. But it will take time… points of view on a lexical gram-
mar of English. In The changing faces of corpus linguistics, ed. Antoinette Renouf
and Andrew Kehoe. Amsterdam / New York: Rodopi, 377–390.
137
Mansfield, Gillian. 2006. Changing channels. Media language in (inter)action.
Milano: LED.
Mauranen, Anna. 2004. Spoken corpus for an ordinary learner. In How to use cor-
pora in language teaching, ed. John McHardy Sinclair. Amsterdam / Philadel-
phia: John Benjamins Longman, 89–105.
May, Renato. 1962. Cinema e linguaggio. Brescia: La Scuola Editrice.
McCarthy, Michael. 1998. Spoken language and applied linguistics. Cambridge: Cam-
bridge University Press.
McCarthy, Michael. 1999. What constitutes a basic vocabulary for spoken com-
munication? Studies in English language and literature 1: 233–249.
McEnery, Tony and Andrew Wilson. 1996. Corpus linguistics. Edinburgh: Edin-
burgh University Press.
Miller, Jim. 2006. Spoken and Written English. In The handbook of English linguistics,
ed. Bas Aarts and April McMahon. Malden / Oxford: Blackwell, 670–691.
Miller, Jim and Regina Weinert. 1998. Spontaneous spoken language. Oxford:
Clarendon.
Menarini, Alberto. 1955. Il cinema nella lingua la lingua del cinema. Milano / Roma:
Fratelli Bocca Editori.
Morandini, Laura, Morandini, Luisa, Morandini Morando. 2006. Il Morandini
2007: Dizionari dei Film. Bologna: Zanichelli.
Nencioni, Giovanni. 1976. Parlato-parlato, parlato-scritto, parlato-recitato. Stru-
menti linguistici, 29: 1–56.
Nencioni, Giovanni. 1983. Di scritto e di parlato. Discorsi linguistici. Bologna: Zani-
chelli.
Partington, Alan. 1998. Patterns and meanings. Using corpora for English language
research. Amsterdam / Philadelphia: John Benjamins.
Pavesi, Maria. 1994. Osservazioni sulla linguistica del doppiaggio. In Il doppiaggio:
trasposizioni linguistiche e culturali, ed. Raffaella Baccolini and Rosa M. Bol-
lettieri Bosinelli. Bologna: CLUEB, 129–142.
Pavesi, Maria. 1996. L’allocuzione nel doppiaggio dall’inglese all’italiano. In Tradu-
zione multimediale per il cinema, la televisione e la scena. Atti del convegno inter-
nazionale (Forlì, October 26–28 1995). ed. Christine Heiss, Rosa M. Bollettieri
Bosinelli, 117–130. Bologna: CLUEB.
Pavesi, Maria. 2005. La Traduzione Filmica. Aspetti del parlato doppiato dall’inglese
all’italiano. Roma: Carocci.
Pavesi, Maria and Annalisa Malinverno. 2000. Sul turpiloquio nella traduzione
filmica. In Tradurre il cinema, ed. Christopher Taylor. Trieste: La Stea, 75–90.
Quaglio, Paulo. 2009. Television Dialogue. The sitcom Friends vs. natural conversa-
tion. Amsterdam / Philadelphia: John Benjamins.
Quaglio, Paulo and Douglas Biber. 2006. The grammar of conversation. In The
handbook of English linguistics, ed. Bas Aarts and April McMahon, 692–723.
Malden / Oxford: Blackwell.
138
Redeker, Gisela. 2006. Discourse markers as attentional cues at discourse transi-
tions. In Approaches to discourse particles, ed. Kirsten Fischer. Amsterdam:
Elsevier, 339–358.
Remael, Aline. 2001. Some thoughts on the study of multimodal, and multimedia
translation. In (Multi)Media Translation. Concepts, practices and research, ed.
Henrik Gottlieb and Yves Gambier. Amsterdam / Philadelphia: John Benjamins,
13–22.
Renouf, Antoinette. 1997. Teaching corpus linguistics to teachers of English. In
Teaching and language corpora, ed. Anne Wichmann, Steven Fligelstone, Tony
McEnery, and Gerry Knowles. London / New York: Longman, 255–266.
Renouf, Antoinette. 2007. Corpus linguistics 25 years on: from super-corpus to
cyber-corpus. In Corpus linguistics 25 years on, ed. Roberta Facchinetti. Amster-
dam / New York: Rodopi, 27–50.
Reppen, Randi. 2001a. Register variation in student and adult speech and writ-
ing. In Variation in English: Multi-dimensional studies, ed. Susan Conrad and
Douglas Biber. London: Longman, 187–199.
Reppen, Randi. 2001b. Review of MonoConc Pro and WordSmith Tools. Language
Learning & Technology 5(3): 32–36.
Reppen, Randi. 2010. Using Corpora in the Language Classroom. Cambridge: Cam-
bridge Language Education.
Rey, Jennifer M. 2001. Changing gender roles in popular culture: Dialogue in Star
Trek episodes from 1966 to 1993. In Variation in English: Multi-dimensional
studies, ed. Susan Conrad and Douglas Biber, 138–155. London: Longman.
Rossi, Alessandra. 2003. La lingua del cinema. In La lingua italiana e i mass me-
dia, ed. Ilaria Bonomi, Andrea Masini and Silvia Morgana. Roma: Carocci
Editore, 93–126.
Sacks, Harvey. 1992. Lectures on conversation. Oxford: Blackwell.
Schiffrin, Deborah. 1987. Discourse markers. Cambridge: Cambridge University
Press.
Scott, Mike. 1998. WordSmith Tools. Oxford: Oxford University. <http://www.lexically.
net/wordsmith/step_by_step/index.html>. Last accessed on 27/07/2011.
Scott, Mike and Chris Tribble. 2006. Textual patterns. Amsterdam / Philadelphia:
John Benjamins.
Searle, John R. 1969. Speech acts. Cambridge: Cambridge University Press.
Sinclair, John McHardy. 1991. Corpus concordance collocation. Oxford: Oxford Uni-
versity Press.
Sinclair, John McHardy. 1996. The search for units of meaning. Textus IX: 75–106.
Sinclair, John McHardy. 1999. A way with common words. In Out of corpora: studies
in honour of Stig Johansson, ed. Hilde Hasselgård and Signe Oksefjell. Amster-
dam: Rodopi, 157–179.
Sinclair, John McHardy. 2004a. Trust the text: Language, corpus and discourse. Lon-
don / New York: Routledge.
139
Sinclair, John McHardy. 2004b (first printed in 1987). Corpus creation. In Corpus
linguistics: Readings in a widening discipline, ed. Geoffrey Sampson and Diana
McCarthy. London / New York: Continuum, 78–84.
Sinclair, John McHardy. 2004c. How to use corpora in language teaching. Amster-
dam / Philadelphia: John Benjamins.
Sinclair, John McHardy. 2006. The case for a Corpus. Seminar given for the De-
partment of Foreign Languages and Literatures, Università Cattolica del Sacro
Cuore, Milan.
Stame, Stefania. 1999. I marcatori della conversazione. In La conversazione. Un’intro-
duzione allo studio della conversazione verbale, ed. Renata Galattolo and Gabriele
Pallotti. Milano: Raffello Cortina Editore, 169–186.
Stern, Karen. 2005. The Longman Spoken American Corpus: providing an in-depth
analysis of everyday English, Pearson Longman, <http://www.pearsonlongman.
com/dictionaries/pdfs/Spoken-American.pdf>.
Stubbs, Michael. 1996. Text and corpus analysis: Computer assisted studies of lan-
guage and institutions. Oxford / Massachusetts: Blackwell.
Stubbs, Michael. 2001. Words and phrases: Corpus studies in lexical semantics. Ox-
ford / Massachusetts: Blackwell.
Stubbs, Michael. 2007. An example of frequent English phraseology: distributions,
structures and functions. In Corpus linguistics 25 years on, ed. Roberta
Facchinetti. Amsterdam / New York: Rodopi, 89–106.
Svartvik, Jan. 2007. Corpus linguistics 25 years on. In Corpus linguistics 25 years
on, ed. Roberta Facchinetti. Amsterdam / New York: Rodopi, 11–26.
Tannen, Deborah. 1982. The oral / literate continuum in discourse. In Spoken and
written language: Exploring orality and literacy, ed. Deborah Tannen, 1–16.
Norwood / New Jersey: Ablex Publishing Corporation.
Taylor, Christopher. 1999. Look who’s talking. An analysis of film dialogue as a
variety of spoken discourse. In Massed medias. Linguistic tools for interpreting
media discourse, ed. Linda Lombardo, Louann Haarman, John Morley, and
Christopher Taylor. Milano: LED, 247–278.
Taylor, Christopher. 2000a. In Defence of the Word: Subtitles as Conveyors of
Meaning and Guardians of Culture. In La traduzione multimediale. Quale tra-
duzione per quale testo?, ed. Rosa M. Bollettieri Bosinelli, Christine Heiss,
Marcello Soffritti, and Silvia Bernardini. Bologna: CLUEB, 153–166.
Taylor, Christopher, ed. 2000b. Tradurre il cinema. Atti del convegno organizzato
da G. Soria e C. Taylor 29–30 novembre 1996. Trieste: Università degli Studi
di Trieste.
Taylor, Christopher. 2000c. The subtitling of film; reaching another community.
In Discourse and community; doing functional linguistics, ed. Eija Ventola. Tü-
bingen: Gunter Narr Verlag, 309–330.
Taylor, Christopher. 2003. Multimodal transcription in the analysis, translation and
subtitling of Italian films. The Translator, Special Issue, 9(2): 191–208.
140
Taylor, Christopher and Anthony Baldry. 2004. Multimodal concordancing and
subtitles with MCA. In Corpora and discourse, ed. Alan Partington, John
Morley, and Louann Haarman. Bern: Peter Lang, 57–70.
Thomas, Jenny. 1995. Meaning in interaction: An introduction to pragmatics. Lon-
don: Longman.
Tognini-Bonelli, Elena. 2001. Corpus linguistics at work. Amsterdam / Philadelphia:
John Benjamins.
Ulrych, Margherita. 1999. Focus on the translator in a multidisciplinary perspective.
Padova: Unipress.
Wichmann, Anne. 2007. Corpora and spoken discourse. In Corpus linguistics 25 years
on, ed. Roberta Facchinetti. Amsterdam / New York: Rodopi, 73–88.
Wynne, Martin. 2004. Developing linguistic corpora: A guide to good practice. Lon-
don: AHDS. <http://www.ahds.ac.uk/creating/guides/linguistic-corpora/
chapter6.htm>. Last accessed on 27/07/2011.
Additional Online Material

(Last accessed on 27/07/2011)
Catwoman movie script. The Daily Script. <http://www.dailyscript.com/scripts/

catwoman.pdf>.
Phrases in English by William Fletcher. <http://phrasesinenglish.org/>.
Internet Movie Database. <http://www.imdb.com/>.
Linguistic Data Consortium. <http://www.ldc.upenn.edu/About/>.
LDC guide to conventions. Linguistic Data Consortium. <http://projects.ldc.upenn.
edu/SBCSAE/transcription/csae-conventions.html#ortho>.
Mission: Impossible II movie script. AwesomeFilm.com. <http://www.awesomefilm.
com/script/MI2.html>.
Shallow Hal movie script. Drew’s Script-O-Rama. <http://www.script-o-rama.com/
movie_scripts/s/shallow-hal-script-transcript-paltrow.html>.
The Devil Wears Prada movie script. The Daily Script. <www.dailyscript.com/scripts/
devil_wears_prada.pdf>.
AMC Transcribed Movies
Comar, Jean-Christophe (alias Pitof ). 2004. Catwoman. Warner Bros.

Farrelly, Bobby and Peter Farrelly. 2000. Me, Myself & Irene. 20th Century Fox.
Farrelly, Bobby and Peter Farrelly. Shallow Hal. 20th Century Fox.
Frankel, David. 2006. The Devil wears Prada. 20th Century Fox.
Roach, Jay. 2000. Meet the parents. Universal Studios and DreamWorks.
141
Romanek, Mark. 2002. One hour photo. Fox Searchlight Pictures.
Soderbergh, Steven. 2000. Erin Brockovich. Universal Pictures and Columbia Pic-
tures.
Soderbergh, Steven. 2001. Ocean’s eleven. Warner Bros.
Van Sant, Gustav. 2000. Finding Forrester. Columbia Pictures.
Wachowsky, Andrew Paul and Laurence Wachowsky. 2003. The matrix reloaded.
Warner Bros.
Woo, J. 2000. Mission: Impossible II. Paramount Pictures and United International
Pictures.
142
LINGUE E CULTURE
Questa collana del Dipartimento di Scienze Linguistiche e Letterature Straniere dell’Università

Cattolica di Milano intende offrire una riflessione scientifica organica sulle lingue e le lette-
rature europee ed extraeuropee, di cui si professa l’insegnamento nella Facoltà di Scienze
Linguistiche e Letterature Straniere della medesima Università.
La collana fonda le radici in una tradizione di studi caratterizzata da due filoni – uno filologi-
co letterario, l’altro linguistico – colti nella loro reciprocità. I temi della collana si incentrano
su studi linguistici, stilistici e letterari relativi alle culture europee ed extra-europee. La col-
lana accoglierà principalmente studi monografici e tesi di dottorato.
La collection du Département de Sciences Linguistiques et Littératures Etrangères de l’Uni-

versité Catholique de Milan vise à offrir une réflexion scientifique organique sur les langues
et les littératures européennes et extra-européennes.
La collection se fonde sur une tradition d’études caractérisée par deux approches – l’une
philologique et littéraire, l’autre linguistique – prises en compte dans leur réciprocité. Les
sujets de la collection se concentrent sur des études linguistiques, stylistiques et littéraires.
La collection accueillera principalement des études monographiques et thèses de doctorat.
This series, edited by the Department of Language Sciences and Foreign Literatures of the
Università Cattolica del Sacro Cuore in Milan, intends to publish scholarly reflections on the
languages and literatures taught within this Languages and Literatures Faculty.
The series is rooted in a tradition of studies which are both philologico-literary and linguistic
– a combination of approaches designed to be both rigorous and complementary. The
themes of the series will focus on linguistic, stylistic and literary studies related to both
European and extra-European cultures. The series will include mostly monographs and
doctoral thesis.
01 Pierfranca Forchini / Movie Language Revisited

Evidence from Multi-Dimensional Analysis and Corpora
142 pages / ISBN 978-3-0343-1076-5 / 2012

Movie Language Revisited: Lingue E Culture

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Movie Language Revisited: Lingue E Culture

Hochgeladen von

Copyright:

Verfügbare Formate

LINGUE E CULTURE

Languages and Cultures – Langues et Cultures

This book explores the linguistic nature of American movie conversation,

Pierfranca Forchini Movie Language Revisited

Pierfranca Forchini has an MA in Foreign Languages and Literatures, an