Discourse On The Move

Discourse on the Move
Studies in Corpus Linguistics (SCL)

SCL focuses on the use of corpora throughout language study, the development
of a quantitative approach to linguistics, the design and use of new tools for
processing language texts, and the theoretical implications of a data-rich
discipline.
General Editor Consulting Editor

Elena Tognini-Bonelli Wolfgang Teubert
The Tuscan Word Center/ University of Birmingham
The University of Siena
Advisory Board
Michael Barlow Graeme Kennedy
University of Auckland Victoria University of Wellington
Douglas Biber Geoffrey N. Leech
Northern Arizona University University of Lancaster
Marina Bondi Anna Mauranen
University of Modena and Reggio Emilia University of Helsinki
Christopher S. Butler Ute Rmer
University of Wales, Swansea University of Hannover
Sylviane Granger Michaela Mahlberg
University of Louvain University of Liverpool
M.A.K. Halliday Jan Svartvik
University of Sydney University of Lund
Susan Hunston John M. Swales
University of Birmingham University of Michigan
Stig Johansson Yang Huizhong
Oslo University Jiao Tong University, Shanghai
Volume 28
Discourse on the Move. Using corpus analysis to describe discourse structure
Douglas Biber, Ulla Connor and Thomas A. Upton
Using corpus analysis
to describe discourse structure
Douglas Biber
Northern Arizona University
Ulla Connor
Thomas A. Upton
Indiana University Indianapolis
John Benjamins Publishing Company

Amsterdam/Philadelphia
TM The paper used in this publication meets the minimum requirements of
8
American National Standard for Information Sciences Permanence of

Paper for Printed Library Materials, ansi z39.48-1984.
Library of Congress Cataloging-in-Publication Data
Biber, Douglas.
Discourse on the move : using corpus analysis to describe discourse structure / Douglas
Biber, Ulla Connor, Thomas A. Upton.
p. cm. (Studies in Corpus Linguistics, issn 1388-0373 ; v. 28)
Includes bibliographical references and index.
1. Discourse analysis--Data processing. I. Connor, Ulla, 1948- II. Upton, Thomas
A. (Thomas Albin) III. Title.
P302.3.B53 2007
401'.41--dc22 2007029145
isbn 978 90 272 2302 9 (Hb; alk. paper)
2007 John Benjamins B.V.

No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any
other means, without written permission from the publisher.
John Benjamins Publishing Co. P.O. Box 36224 1020 me Amsterdam The Netherlands
John Benjamins North America P.O. Box 27519 Philadelphia pa 19118-0519 usa
Table of contents
Preface xi
chapter 1
Discourse analysis and corpus linguistics 1
1 Discourse and discourse analysis 1
1.1 Discourse studies of language use 3
1.2 Discourse studies of linguistic structure beyond the sentence 4
1.3 Discourse studies of social practices and ideological assumptions associ-
ated with communication 6
1.4 Register and genre perspectives on discourse 7
1.5 Identifying structural units in discourse 9
2 Corpus-based investigation of discourse structure 10
3 Top-down versus bottom-up corpus-based approaches to discourse
analysis 12
3.1 Examples of top-down analyses of discourse 14
3.2 Example of bottom-up approach 16
4 Creating a specialized corpus for discourse analysis 17
5 Overview of the book 19
Part 1. Top-down analyses of discourse organization
chapter 2
Introduction to move analysis 23
WITH Budsaba Kanoksilapatham
1 Background 23
2 Swales move analysis of research articles 25
3 Move analysis of research articles applied across genres 29
3.1 Description and examples 29
3.2 Summary of previous research on move analysis 32
4 Overview of the methods for move analysis 32
4.1 General steps of a move analysis 32
4.2 Inter-rater reliability 35

5 Using a corpus-based approach to move analysis 36
5.1 Corpus-based move analysis 36
5.2 General advantages of corpus-based approaches to discourse analysis 37
5.3 Specific advantages of a corpus-based perspective for move analysis 38
5.3.1 Identifying linguistic features of moves 38
5.3.2 Move frequencies and lengths 39
5.3.3 Mapping move use and locations 39
5.3.4 Genre prototypes 40
6 Summary 40
chapter 3
Identifying and analyzing rhetorical moves in philanthropic discourse 43
1 Background 43
2 A specialized corpus of fundraising texts 44
3 Determining and analyzing discourse moves: Direct mail letters 46
3.1 Previous analysis of direct mail letters 46
3.2 A move analysis of fundraising letters: Background and methodology 46
3.2.1 Move types 46
3.2.2 Structural elements 52
3.3 Analysis 54
3.4 Results 55
3.5 Discussion 57
3.6 Letter prototypes 58
4 Linguistic analysis of moves: Tracking the use of stance structures 61
4.1 Identifying grammatical stance devices 62
4.2 Interpreting the use of grammatical stance devices used in moves 63
5 Final thoughts 68
chapter 4
Rhetorical moves in biochemistry research articles 73
BY Budsaba Kanoksilapatham
1 Background 73
2 Description of the corpus 75
3 Determining the move categories in the genre of biochemistry research
articles 76
3.1 The introduction section 77
3.2 The methods section 78
3.3 The results section 79
Table of contents
3.4 The discussion section 81

4 Coding moves in the corpus of biochemistry research articles 83
5 Distribution of move types within texts from the biochemistry
corpus 84
6 Linguistic characteristics of rhetorical moves in biochemistry research
articles 87
7 Linguistic variation among move categories in biochemistry research
articles 90
8 Multi-dimensional variation among move types within the same
section 103
chapter 5
Rhetorical appeals in fundraising 121
WITH Molly Anthony & Kostyantyn Gladkov
1 Elements of persuasion 121
2 Determining and analyzing rhetorical appeals 124
2.1 Rational appeals (Logos) 125
2.2 Credibility appeals (Ethos) 129
2.3 Affective appeals (Pathos) 131
3 Analysis, segmentation, and classification 132
3.1 Results and discussion 133
4 Linguistic description of appeals 136
4.1 Wordlists 137
4.2 Keywords 138
5 Appeals and discourse structure of letters 141
6 Conclusion 143
Part 2. Bottom-up analyses of discourse organization
chapter 6
Introduction to the identification and analysis of vocabulary-based
discourse units 155
WITH Eniko Csomay, James K. Jones, & Casey Keck
1 Conceptual introduction to VBDUs 156
2 Automatic identification of VBDUs in texts 161
3 Perceptual correlates of VBDUs 163
4 Using VBDUs to analyze the discourse structure of texts 169
5 Going one step further: Identifying generalizable VBDU types 170
chapter 7
Vocabulary-based discourse units in biology research articles 175
WITH James K. Jones
1 Constructing the corpus of VBDUs 176
2 Analyzing the linguistic characteristics of VBDUs: Multi-dimensional
analysis 178
3 Comparing the multi-dimensional characteristics of research article
sections 184
4 The multi-dimensional profile of VBDUs within a research article:
Tracking the movement of discourse 186
5 Identifying and interpreting the multi-dimensional text types of biology
research articles 190
6 Using VBDU text types to describe the discourse organizational patterns
of biology research articles 194
7 Starting and ending research article sections 196
7.1 Describing the typical discourse organizations of introductions 197
7.2 Describing the typical discourse organizations of methods sections 199
7.3 Describing the typical discourse organizations of discussion sections 201
8 Preferred text type sequences across research article section
boundaries 203
9 Comparing the preferred discourse styles of research journals 205
10 Conclusion 208
chapter 8
Vocabulary-based discourse units in university class sessions 213
BY Eniko Csomay
1 From constructing a corpus of VBDUs to identifying VBDU
text-types 214
1.1 Constructing a corpus of VBDUs 214
1.2 Analyzing the linguistic characteristics of VBDUs applying MD analytical
techniques 215
1.3 VBDUs and dimension scores: the multi-dimensional profile of the first
three VBDUs of a business management class 217
2 Dimension scores and VBDU text-types 222
2.1 Interpreting the clusters as VBDU types based on their linguistic
characteristics 224
2.1.1 Cluster 1: Personalized framing 225
2.1.2 Cluster 2: Informational monologue 227
2.1.3 Cluster 3: Contextual interactive 228
Table of contents
2.1.4 Cluster 4: Unmarked 229

3 From VBDU text-types to discourse structure 230
3.1 Functional interpretation of VBDU types 230
3.2 Text as sequences of VBDU types 232
4 Summary and conclusion 237
chapter 9
Conclusion: Comparing the analytical approaches 239
1 Overview 239
2 Comparing the top-down and bottom-up descriptions of biology
research articles. 242
2.1 Discourse units in biology research articles 243
2.2 The dimensions of linguistic variation in biology research articles 244
2.3 The functional and linguistic characteristics of the discourse types
(move types vs VBDU types) in biology research articles 249
2.4 Description of the typical discourse organization of biology research
articles 253
3 Summary and prospects for future research 258
Appendix 1 261
A brief introduction to multi-dimensional analysis 261
A.1 Conceptual introduction to the multi-dimensional approach
to variation 261
A.2 Overview of methodology in the multi-dimensional approach 262
Appendix 2 267
Grammatical and lexico-grammatical features included in
the multi-dimensional analyses 267
References 273
Index
Preface
The idea for this book evolved slowly, emerging from research taking place at sev-
eral institutions applying different approaches to a single research problem: can
discourse structure and organization be investigated from a corpus perspective?
At Northern Arizona University (NAU), research on this topic began in a PhD
seminar in 1999. Inspired by the research of Youmans (1991; 1994) on the Vo-
cabulary Management Profile, students in that seminar explored ways in which
the discourse structure of a text can be discovered automatically by tracking the
text-internal use of vocabulary and other linguistic features. This initial effort re-
sulted in a PhD dissertation by Csomay (2002), followed by several other research
studies undertaken at NAU that employed the TextTiling methods originally de-
veloped by Hearst (1997).
Over the same period, researchers at Indiana University Purdue University
Indianapolis (IUPUI) and Georgetown University were exploring a completely
different approach to this same research problem: applying the framework of rhe-
torical move analysis, developed by Swales (1981; 1990) for the detailed analysis of
texts, to analyze the general rhetorical and linguistic patterns of discourse struc-
ture in a corpus. At IUPUI, this research effort focused primarily on philanthropic
discourse, especially grant proposals and fundraising letters. And at Georgetown
University, this research culminated in 2003 with the completion of a PhD disser-
tation by Kanoksilapatham (2003) on the discourse structure of biochemistry re-
search articles.
The actual idea for the present book came about as colleagues from these dif-
ferent institutions would get together at conferences and discuss their different
approaches to the study of discourse structure and organization from a corpus
perspective. We realized that there had been very little previous research done on
this topic, and that by combining and comparing our approaches, we could pro-
vide a relatively comprehensive overview of this emerging subfield.
Because the book grew out of relatively independent research efforts, each au-
thor has had different primary responsibilities. At the same time, we have been
eager to structure the book as a coherent treatment of this subject: an authored
book rather than an edited collection of articles. Thus, the three book authors
share equal responsibility for revising and editing all chapters, and ultimately the
content of all chapters. But on the other hand, each chapter has different primary
authors, including several co-authors in addition to the three book authors for
Chapters 13, 57, and 9. Two chapters are invited, single-authored contributions
Chapter 4 by Kanoksilapatham and Chapter 8 by Csomay. The primary authors
for each chapter are as follows:
Chapter 1: Biber, Connor, Upton
Chapter 2: Connor, Upton, Kanoksilapatham
Chapter 3: Upton, Connor
Chapter 4: Kanoksilapatham
Chapter 5: Connor, Anthony, Gladkov, Upton
Chapter 6: Biber, Csomay, Jones, Keck
Chapter 7: Biber, Jones
Chapter 8: Csomay
Chapter 9: Biber, Connor, Upton
We would like to thank the numerous colleagues who have made useful sugges-
tions and criticisms over the years in relation to the various research projects that
come together in the present book. We also owe a special thanks to Eric Friginal,
Bethany Gray, Jack Grieve, Mark Johnson, Erkan Karabacak, YouJin Kim, Poon-
pon Kornwipa, Jingjing Qin, Angkana Tongpoon, and Faith Young -- the students
of ENG 707 (Seminar on Discourse) at Northern Arizona University in the fall of
2006, who read the entire book manuscript and made numerous useful comments
and suggestions (including the title for our book, suggested by Jack Grieve).
chapter 1
Discourse analysis and corpus linguistics
1 Discourse and discourse analysis
The study of discourse has become a major focus of research in many disciplines
of the humanities, social sciences, and information sciences. Because this area of
study can be approached from so many different perspectives, the terms discourse
and discourse analysis have come to be used in widely divergent ways.
Several introductory treatments survey the range of definitions given to the
term discourse (e.g., Jaworski & Coupland, 1999, pp. 17; Schiffrin, 1994, pp.
2343). Schiffrin, Tannen, and Hamilton (2001) in their introduction to The Hand-
book of Discourse Analysis (p. 1), group previous definitions of discourse analysis
into three general categories: 1) the study of language use; 2) the study of linguistic
structure beyond the sentence; and 3) the study of social practices and ideological
assumptions that are associated with language and/or communication.
The object of study for these three approaches to discourse is increasingly re-
moved from the research goals of traditional structural linguistics. The study of
language use focuses on traditional linguistic constructs, such as phrase structures
and clause structures, but addresses the problem of why languages have structural
variants with nearly equivalent meanings (e.g., particle movement, as in pick up
the book versus pick the book up). By considering factors that are not strictly struc-
tural, linguists are able to predict when one or another variant is likely to be used.
For example, the length of the direct object noun phrase is an important factor
predicting the likelihood of particle movement. Aspects of the discourse context
are often important for understanding linguistic variation, especially for linguistic
constructions that involve word order variation (such as passives, extraposition,
clefts, inversions, existential there, etc.). For example, writers will choose passive
voice rather than active voice depending on the topical relevance of the patient
noun phrase.
The study of linguistic structure beyond the sentence focuses on a larger ob-
ject of study: extended sequences of utterances or sentences, and how those texts
are constructed and organized in systematic ways. Although studies of this type
are removed from the traditional concerns of structural linguistics (which focuses
mostly on phrasal and clause syntax), the two share a primary focus on linguistic
form and how language structures are used for communication.
In contrast, the third approach to discourse is socio-cultural in orientation,
and generally not concerned with the description of particular texts or the analysis
of language structure and use. Socio-cultural approaches to discourse sometimes
focus on the actions of participants in particular communication events, and at
other times focus on the general characteristics of speech/discourse communities
in relation to issues such as power and gender. Although the socio-cultural ap-
proaches are obviously important for understanding the broader role of texts in
culture, they typically are not concerned with understanding the linguistic forms
used in those texts.
Corpus linguistic studies are generally considered to be a type of discourse
analysis because they describe the use of linguistic forms in context. For example,
words are described in terms of their typical collocates: the words that normally
occur in the discourse context. Grammatical variation is also described in terms of
the words and other grammatical structures that occur in the context. As such,
corpus linguistic research has fallen squarely under the first approach to discourse:
the study of language use.
However, it has been much less common to study discourse organization from
a corpus perspective. In fact, these two subfields have research goals and methods
that might be considered incompatible: The study of discourse organization lin-
guistic structure beyond the sentence is usually based on detailed analysis of a
single text, resulting in a qualitative linguistic description of the textual organiza-
tion. In contrast, corpus studies are based on analysis of all texts in a corpus, utiliz-
ing quantitative measures to identify the typical distributional patterns that occur
across texts.
In fact, individual texts often have no status whatsoever in corpus investiga-
tions. Instead, what we find are comparisons of the distributional patterns in one
sub-corpus to the patterns in a second sub-corpus. For example, Scott and Tribble
(2006) describe how we can compare the keywords of the spoken versus written
sub-corpora from the British National Corpus. Nesselhauf (2005, Chapter 3) de-
scribes the deviant collocations in a corpus of learner English essays. And Rmer
(2005, Chapter 4) documents the variants and distributional patterns of progres-
sive verb phrases in the spoken sub-corpora from the British National Corpus.
These studies are typical of corpus-based research on discourse: they describe the
typical patterns of language use, considering the systematic ways in which aspects
of the lexico-grammatical context tend to occur together with different linguistic
variants; but such corpus-based studies usually tell us nothing about the discourse
structure of particular texts.
Chapter 1. Discourse analysis and corpus linguistics
We thus see this interface as one of the current challenges of corpus linguistics:
Is it possible to merge the analytical goals and methods of corpus linguistics with
those of discourse analysis that focuses on the structural organization of texts?
Can a corpus be analyzed to identify the general patterns of discourse organization
that are used to construct texts, and can individual texts be analyzed in terms of
the general patterns that result from corpus analysis? These are the central issues
that we take up in the present book.
1.1 Discourse studies of language use
The first major approach to discourse identified above the study of language use
has been carried out from several different perspectives, including research in
pragmatics, speech act theory, functional linguistics, variationist studies, and reg-
ister studies. These subfields all investigate how words and linguistic structures are
used in discourse contexts to express a range of meanings. Many of these ap-
proaches focus on the study of linguistic variation, showing how linguistic choice
is systematic and principled when considered in the larger discourse context.
There have been numerous studies of grammar and discourse over the last two
decades, as researchers have come to realize that the description of grammatical
function is as important as structural analysis. By studying linguistic variation in
naturally occurring discourse, researchers have been able to identify systematic
differences in the functional use of each variant. An early study of this type is
Prince (1978), who compares the discourse functions of WH-clefts and it-clefts.
Thompson and Schiffrin have carried out numerous studies in this research tradi-
tion; Thompson on detached participial clauses (1983), adverbial purpose clauses
(1985), omission of the complementizer that (S. Thompson & Mulac, 1991a,
1991b), relative clauses (Fox & Thompson, 1990); and Schiffrin on verb tense
(1981), causal sequences (1985b), and discourse markers (1985a, 1987). Other
more recent studies of this type include Ward (1990) on VP preposing, Collins
(1995) on dative alternation, and Myhill (1995; 1997) on modal verbs.
Most corpus-based research is discourse analytic in this sense, investigating
systematic patterns of language use across discourse contexts, generalized over all
the texts in a corpus (see, e.g., Biber, Conrad, & Reppen, 1998; McEnery, Xiao, &
Tono, 2006). The advantages of a corpus approach for the study of discourse, lexis,
and grammatical variation include the emphasis on the representativeness of the
text sample, and the computational tools for investigating distributional patterns
across discourse contexts. The recent edited volumes by Connor and Upton (2004b),
Meyer and Leistyna (2003), Lindquist and Mair (2004), and Sampson and McCa-
rthy (2004) provide good introductions to work of this type. There are also a number
of book-length treatments reporting corpus-based investigations of grammar and
discourse: for example, Aijmer (2002) on discourse particles, Collins (1991) on

clefts, Granger (1983) on passives, Mair (1990) on infinitival complement clauses,
Meyer (1992) on apposition, Rmer (2005) on progressive verbs, Tottie (1991) on
negation, and several books on nominal structures (e.g., de Haan, 1989; Geisler,
1995; Johansson, 1995; Varantola, 1984). The Longman Grammar of Spoken and
Written English (1999) applies corpus-based analysis to a more comprehensive
grammatical description of English, showing how any grammatical feature can be
described for both structural characteristics and discourse patterns of use.
The recent book by Partington (2003) is interesting here in that it combines
corpus-based study with an analysis of pragmatics, to investigate the discourse
features of White House briefings. A corpus of 48 briefings (250,000 words of run-
ning texts) was subjected to computerized concordance and keyword analysis.
However, the computational analyses were guided by detailed qualitative analysis:
a summer reading the corpus briefings and making notes (p. 12). This allowed
Partington to check on oddities of computerized collocation analysis, highlighting
odd language usage that computerized analysis might not have revealed.
A more specialized corpus-based approach to the study of language use is
multi-dimensional (MD) analysis. Unlike most corpus-based research, MD stud-
ies investigate language use in individual texts. This approach describes how lin-
guistic features co-occur in each text, resulting in more general patterns of linguis-
tic co-occurrence that hold across all texts of a corpus. The approach can thus be
used to show how patterns of linguistic features vary across individual texts, or
across registers and genres. MD analysis is used in several chapters in the present
book, and so it is introduced more fully in Appendix One.
1.2 Discourse studies of linguistic structure beyond the sentence
The second major approach to discourse analysis identified above the study of
linguistic structure beyond the sentence is the primary focus of the present
book. Previous research on discourse-level structures has been undertaken from
linguistic, cognitive, and computational perspectives.
Linguistic Perspectives: Linguistic analyses of discourse structure have focused on
lexico-grammatical features that indicate the organization of discourse (see, e.g.,
the papers in Coulthard, 1994). Focusing on units beyond the sentence-level (e.g.,
paragraphs in written discourse and episodes in oral discourse), these researchers
investigate linguistic devices that signal the underlying discourse structure.
Much research of this type has described the discourse functions of particular
words and phrases, referred to as discourse markers, connectives, discourse par-
ticles (Schiffrin, 1994), lexical phrases (Hansen, 1994; Nattinger & DeCarrico,
1992), or cue phrases (Passonneau & Litman, 1996). Other studies discuss the
linguistic devices used to mark information structure, topical development, or
rhetorical structures in discourse (e.g., Mann, Matthiessen, & Thompson, 1992;
Mann & Thompson, 1988; Prince, 1981). Finally, some studies track the use of
linguistic devices across a text. For example, discourse maps are used to track
verb tense and voice patterns across the sections of research articles (Biber et al.,
1998, Chapter 5), while other studies track referential expressions used in ana-
phoric chains throughout a text (e.g., Biber, 1992; Fox, 1987; Givn, 1983).
A related area of research is the study of textual cohesion: the use of lexical
and grammatical devices as the glue of a text, holding the text together as dis-
course rather than an accidental sequence of sentences (see, e.g., Halliday, 1989;
Halliday & Hasan, 1976; Hoey, 1991; Phillips, 1985; Tyler, 1995). Linguistic de-
vices used to establish cohesion include anaphoric pronouns, linking adverbials,
and the use of lexical repetition and synonymy to establish topical cohesion. Simi-
larly, Tannen (1989) found that repetitions in conversation operate as a kind of
theme-setting at the beginning of a topical unit and at the end, forming a kind of
coda (p. 69).
Cognitive perspectives: Cognitive investigations of discourse structure study the
factors that make a text coherent. Text coherence refers to the linking of ideas
within a text to create meaning for readers. Analyses of textual coherence typically
identify the propositions expressed in a text, the logical relations among those
propositions, and how listeners/readers are able to construct the overall textual
meaning in terms of those propositional relations. In contrast to the study of cohe-
sion, which refers to surface-level patterns, coherence entails the study of larger
discourse relationships. Many of these studies describe texts in terms of the coher-
ence relations expressed by clause-level propositions (Bateman & Rondhuis, 1997;
Dahlgren, 1996; Hobbs, 1979; Sanders, 1997; Sanders & Noordman, 2000). Related
studies also consider other factors that influence coherence, including differences
between subject versus presentational matter (Mann & Thompson, 1988), text
structural patterns like problem-solution (Connor, 1987) and given-new (theme-
rheme) structures (Cooper, 1988), and the semantic and pragmatic relations be-
tween units (Polanyi, 1985, 1988; Sanders, 1997). Several researchers have devel-
oped analytical frameworks for the study of coherence relations (e.g., Grosz &
Sidner, 1986; McNamara & Kintsch, 1996; Tomlin, Forrest, Ming Pu, & Hee Kim,
1997; Van Dijk, 1981, 1997; Van Dijk & Kintsch, 1983).
The ongoing flow of information is also central to coherence (Grabe & Kaplan,
1996). Studies have approached information flow from various perspectives, in-
cluding representations of the flow of thought (Chafe, 1994, 1997) or short-term
memory (Tomlin et al., 1997).
Computational perspectives: Computational studies of discourse organization have

attempted to model discourse organization for the purposes of information re-
trieval and natural language processing. Most computational studies of discourse
structure have focused on written texts. For example, Morris and Hirst (1991)
developed a lexical algorithm to find chains of related terms, which can be used to
describe the structure of texts, applying Grosz and Sidners (1986) attentional/in-
tentional model. Marcu (2000) explores the feasibility of automatic rhetorical
parsing, applying Mann and Thompsons (1988) Rhetorical Structure Theory.
One important study for the purposes of the present book is Youmans (1991;
1994), who developed the Vocabulary Management Profile (VMP), a computa-
tional method to track the introduction of new vocabulary into a text. Youmans
shows that VMPs are quite sensitive indicators of the episodic structure of written
literary texts, suggesting that the VMP graph provides a direct visual analogue for
constituent structure (p. 113). Youmans compared the results of the VMP to the
paragraph boundaries of literary texts and found 80percent agreement.
Fewer computational studies have focused on the discourse structure of spo-
ken discourse. One of the best known of these, Passonneau and Litman (1996;
1997), attempts to automatically segment spoken texts (spontaneous, narrative
monologues) into discourse units, based on the use of referential noun phrases,
cue words, and pauses. This study further compares the results of the automatic
segmentation to perceptually-identified discourse units.
1.3 Discourse studies of social practices and ideological assumptions

associated with communication
Finally, the third approach to discourse the study of communicative social prac-
tices and ideological assumptions focuses on the social construction of discourse
rather than the linguistic description of particular texts. For example, proponents
of the New Rhetoric (e.g., Bazerman, 1988, 1994; Berkenkotter & Huckin, 1995;
Miller, 1984) have argued for the importance of understanding the knowledge of
social context surrounding texts for helping writers select rhetorical strategies that
work in a given situation. The focus here is to look not only at the products (texts)
but also the processes surrounding the production and consumption of texts, ask-
ing Why are specific discourse-genres written and used by the specialist commu-
nities the way they are? (Bhatia, 1993a, p. 11).
In an attempt to understand the broader social contexts of the discourse, sev-
eral recent corpus-based studies have added analyses of interviews and focus
group discussions with actual writers and readers of the texts or other academic
specialists. For example, Hyland (2000) goes beyond the textual approach to dis-
course analysis of academic articles by adding focus groups, unstructured inter-
views, and discourse-based interviews with subject specialists from those disci-
plines, although the interviewees were not the writers of the articles in Hylands
corpus. The focus groups and the first part of the one-to-one interviews used a
semi-structured format and encouraged the informants to speak generally about
communication and publication practices in their fields. The second stage used a
discourse-based interview which involved detailed discussions about particular
pieces of writing. The informants responded as members of the particular dis-
course community as they interpreted meanings, reconstructed writer motiva-
tions, and evaluated rhetorical effectiveness. They were also encouraged to discuss
specific points in their own work by referring to a paper they had written.
In another corpus study, Hyland (2004b) analyzed a corpus of 240 disserta-
tions by L2 writers at Hong Kong universities, together with interviews with 24
students. The interviews helped in understanding the use of the analyzed metadis-
course markers transitions, frame markers, endophoric markers, evidentials,
code glosses, hedges, boosters, attitude markers, engagement markers, and self-
mentions. Such qualitative analyses can shed light on disciplinary differences as
well as differences between MA and PhD level writers even if the interviewees are
not the actual writers.
Unlike many qualitative studies of texts and writing, in which the researcher
observes, interviews, and works with the actual writer or writers (see, e.g., Bazer-
man & Prior, 2004), corpus studies tend to rely on anonymous writers who are
members of the particular discourse community. In many cases, corpora are con-
structed from published resources, rather than being collected from writers per-
sonally, making it nearly impossible to obtain information about the writers and
the circumstances of writing. However, like the Hyland studies cited above, it is
possible to combine corpus-based analysis with the careful observation of indi-
vidual writers. For example, Connor & Mauranen (1999) undertook a large-scale
corpus analysis of rhetorical moves in grant proposal writing in the sciences and
humanities. This study was later complemented by detailed interviews with five
scholars in these disciplines (Connor, 2000). These scholars were not the writers of
the proposals in the large corpus. However, as specialist informants they were able
to comment on the appropriateness of the move definitions and the identification
of move boundaries in a small corpus of their own proposals.
1.4 Register and genre perspectives on discourse
The terms register and genre have been central to previous investigations of dis-
course. Both terms have been used to refer to varieties associated with particular
situations of use and particular communicative purposes. Many studies simply
adopt one of these terms and disregard the other. In some cases, these authors
might be assuming a theoretical distinction between the two terms, but that dis-
tinction is usually not explicitly noted. For example, studies like Bhatia (2002),
Samraj (2002), Bunton (2002), Love (2002), and Swales (2004) exclusively use the
term genre. In contrast, studies like Ure (1982), Ferguson (1983), Hymes (1984),
Heath and Langman (1994), Bruthiaux (1994; 1996), Conrad (2001), and Biber et
al. (1999) exclusively use the term register.
A few studies attempt to define a theoretical distinction between the constructs
underlying these two terms. For example, Ventola (1984) and Martin (1985) refer
to register and genre as different semiotic planes: genre is the content-plane of
register, and register is the expression-plane of genre; register is in turn the con-
tent-plane of language. Lee (2001) surveys the use of these terms, providing one of
the most comprehensive discussions of how they have been used in previous re-
search (as well as terms like text type and style).
When research studies have attempted to distinguish between register and
genre (such as Couture, 1986; Ferguson, 1994; Martin, 1985; Swales, 1990; Ventola,
1984), the distinction has been applied at two different levels of analysis:
1) to the object of study;
2) to the characteristics of language and culture that are investigated.
Thus, the term register (when it is distinguished from genre) has been used to refer
to a general kind of language associated with a domain of use, such as a legal reg-
ister, scientific register, or bureaucratic register. Register studies have usually fo-
cused on lexico-grammatical features, showing how the use of particular words
and grammatical features vary systematically in accord with the situation of use
(factors such as interactivity, personal involvement, mode, production circum-
stances, and communicative purpose). As such, the term register has been associ-
ated with the first general approach to discourse identified in Section 1 above the
study of language use.
In contrast, the term genre has been used to refer to a culturally recognized
message type with a conventional internal structure, such as an affidavit, a biology
research article, or a business memo. Genre studies have usually focused on the
conventional discourse structure of texts or the expected socio-cultural actions of
a discourse community. For example, genres are how things get done, when lan-
guage is used to accomplish them (Martin, 1985, p. 250), and frames for social
action (Bazerman, 1997b, p. 19). As such, the term genre is often associated with
the second general approach to discourse identified in Section 1 above the study
of linguistic structure beyond the sentence.
In his previous work on linguistic variation, Biber has disregarded theoretical
distinctions between the terms register and genre, preferring the term genre in ear-
lier studies (e.g., Biber 1986, 1988) and the term register in later research (Biber,
1995, 2006b). In both cases, these were used simply as a general cover term to refer
to situationally-defined varieties described for their characteristic lexico-grammat-
ical features, with no implied theoretical distinction between register and genre.
However, in the present book we are focused especially on the internal struc-
ture and organization of texts from a specific variety (e.g., fundraising letters or
biology research articles), a perspective typically associated with the analysis of a
genre rather than register. For this reason, we adopt the term genre throughout the
book to refer to the linguistic variety being analyzed.
1.5 Identifying structural units in discourse
One specific research emphasis for discourse studies of structure beyond the sen-
tence has been the attempt to segment a text into higher-level structural units. These
studies are foundational to the goals of the present book, because the units of analy-
sis in corpus-based studies of discourse structure must be well-defined discourse
units: the segments of discourse that provide the building blocks of texts.
In studies of written texts, discourse units have generally been identified based
on visual as well as textual clues (see, e.g., Hunston, 1994). The smallest unit of
analysis has usually been the proposition, followed by the t-unit or sentence, the
paragraph, and finally the chapter or the whole text (Meyer, 1985). Such units are
identified by written para-linguistic devices (such as sentence punctuation and
paragraph indenting), rather than analysis of textual content or function.
Other studies have considered the initiation of new topics within a text. Inves-
tigating written fiction, Youmans (1991, p. 774) claimed that syntactic function
words do not denote new topics, whereas content words do. Similarly, Fox (1987)
found that, in expository writing, full noun phrases are more likely than pronouns
to indicate the start of a new topic.
In spoken discourse (especially conversation) it has proved especially difficult
to determine what constitutes a new topic, resulting in a reliance on qualitative or
impressionistic findings. As Tannen (1984, p. 38) notes, the boundaries of the
shifting topics in conversation are not always clearly and readily identifiable, and
the initiation of new topics is often unclear (see also Tannen, 1984, 1989; Van Dijk,
1997). Some research has suggested that prosodic and linguistic cues can be used
to determine topical boundaries in oral discourse. For example, pauses, hesita-
tions, false starts, change in pitch, discourse particles, preposed adverbials, sum-
mary statements, and evaluative comments have all been proposed as linguistic
markers that signal a discourse shift in theme or topic (e.g., Brown & Yule, 1983;
Gee, 1986; Korolija & Linell, 1996; Polanyi, 1985; Stubbs, 1983; Tannen, 1987; Van
Dijk, 1981).
In general, these studies have focused on linguistic devices that signal the transi-
tion from one topic to the next, but they have not attempted to rigorously segment
complete texts into well-defined discourse units. However, this is exactly the task
that must be accomplished for corpus-based analyses of discourse structure: we
need comprehensive identification of the structural discourse units within all texts
in the corpus. Two general approaches to text segmentation have been employed in
previous corpus-based research: top-down and bottom-up methods of segmenta-
tion. The following section discusses these two approaches in more detail.
2 Corpus-based investigation of discourse structure
As summarized in the sections above, research on the linguistic characteristics of

texts and discourse has been carried out from two major perspectives: one focus-
ing on the distribution and functions of surface linguistic features corpus studies
of language use in discourse (which typically disregards the existence of individu-
al texts) and the second focusing on the internal organization of texts discourse
studies of linguistic structure beyond the sentence in particular texts.
Discourse studies of language use have usually been quantitative, and in more
recent years, they have been carried out on large text corpora using the techniques
of corpus linguistics; these studies often compare the linguistic characteristics of
discourse from different spoken and written registers. Studies of the second type
have usually been qualitative and based on detailed analysis of a small number of
texts; these studies usually focus on the internal structure of a few texts from a
single genre, such as scientific research articles.
Rmer (2005) is a good example of the first approach. This study describes the
use of progressive verb phrases in spoken English, based on analysis of the British
National Corpus and the Bank of English. Rather than focusing on the organiza-
tion of any particular text, the study focuses on the overall patterns of distribution
and use, considering factors such as the tendency of progressives to occur with
different tenses and aspects; occurrence with different subject types or object
types; occurrence with different adverbials; and the tendency to occur with spe-
cific verbs and verb classes. In contrast, the chapters in Mann and Thompson
(1992) are good examples of the second approach. This book is based on analysis
of a single fundraising letter, showing how the discourse structure and organiza-
tion of that single text can be analyzed from different perspectives.
Surprisingly, few studies have attempted to combine these two research per-
spectives. On the one hand, most corpus-based studies have focused on the quan-
titative distribution of lexical and grammatical features, generally disregarding the
language used in particular texts and higher-level discourse structures or other
aspects of discourse organization. On the other hand, most qualitative discourse

analyses have focused on the analysis of discourse patterns in a few texts from a
single genre, but they have not provided tools for empirical analyses that can be
applied on a large scale across a number of texts or genres. As a result, we know
little at present about the general patterns of discourse organization across a large
representative sample of texts from a genre.
One of the major methodological problems to be solved by any corpus-based
analysis of discourse structure is deciding on a unit of analysis. That is, the first
step in an analysis of discourse structure is to identify the internal discourse seg-
ments of a text, corresponding to distinct propositions, topics, or communicative
functions; these discourse segments become the basic units of the subsequent dis-
course analysis. For a corpus study of discourse structure, all texts in the corpus
must first be analyzed for their component discourse units.
However, such analyses were not even possible based on early text corpora,
because they were composed of text-files rather than complete texts. For example,
text files in the Brown, LOB, and London-Lund Corpora were defined by length
2,000 words long in the case of Brown and LOB, and 5,000 words long in the case
of London-Lund. In some cases, a single text file combines multiple texts, while in
other cases a text is truncated in a text file when the word limit is reached.
This characteristic of early corpora might help to explain why most previous cor-
pus studies have not considered individual texts at all. Rather, the analysis has re-
ported general patterns for the corpus as a whole, or it has compared overall results for
various sub-corpora (e.g., the overall frequency of progressive verbs in a conversa-
tional sub-corpus compared to the frequency in a sub-corpus of academic writing).
More recently, corpora such as the BNC and T2K-SWAL have been designed
to include complete texts, such as complete chapters from a book or complete re-
search articles. It is thus possible, in theory, to analyze the internal discourse struc-
ture of each text in the corpus, and to then discover general patterns of discourse
organization that hold across all texts in the corpus. To achieve this goal, corpus
texts must first be segmented into well-defined discourse units, and then those
units can be used to identify the general ways in which the discourse of corpus
texts is organized. In the following section, we introduce the two major corpus-
based approaches that can be applied to these research goals.
3 Top-down versus bottom-up corpus-based approaches

to discourse analysis
To achieve generalizable corpus-based descriptions of discourse structure, seven

major analytical steps are required:
Determining the types of discourse units the functional/communicative dis-
tinctions that discourse units can serve in these texts (Communicative/Func-
tional Categories)
Segmenting all texts in the corpus into well-defined discourse units (Segmen-
tation)
Identifying and labeling the type (or category) of each discourse unit in each
text of the corpus (Classification)
Analyzing the linguistic characteristics of each discourse unit in each text of
the corpus (Linguistic analysis of each unit)
Describing the typical linguistic characteristics of each discourse unit type, by
comparing all discourse units of a given type across the texts of the corpus
(Linguistic description of discourse categories)
Describing the discourse structures of particular texts as sequences of dis-
course units, in terms of the general type or category of each of those units
(Text structure)
Describing general patterns of discourse organization that hold across all texts
of the corpus (Discourse organizational tendencies)
These seven steps can be achieved through either a top-down research approach or
a bottom-up research approach. The two approaches differ primarily in the order
of analytical steps. In a top-down approach, the analytical framework is developed
at the outset: the discourse unit types are determined before beginning the corpus
analysis, and the entire analysis is then carried out in those terms. In a bottom-up
approach, the corpus analysis comes first, and the discourse unit types emerge
from the corpus patterns. Tables 1.1 and 1.2 summarize the major differences be-
tween these two analytical approaches.
Table 1.1 Top-down corpus-based analyses of discourse organization
Required step in the analysis Realization in this approach
1. Communicative/Functional Categories Develop the analytical framework: determine

set of possible functional types of discourse
units, that is, the major communicative func-
tions that discourse units can serve in corpus
2. Segmentation Segment each text into discourse units (applying
the analytical framework from Step 1)
3. Classification Identify the functional type of each discourse
unit in each text of the corpus (applying the
analytical framework from Step 1)
4. Linguistic analysis of each unit Analyze the lexical/grammatical characteristics
of each discourse unit in each text of the corpus
5. Linguistic description of discourse Describe the typical linguistic characteristics of
categories each functional category, based on analysis of
all discourse units of a particular functional
type in the corpus
6. Text structure Analyze complete texts as sequences of dis-
course units shifting among the different func-
tional types
7. Discourse organizational tendencies Describe the general patterns of discourse or-
ganization across all texts in the corpus
In the top-down approach, the first step is to develop the analytical framework,
determining the set of possible discourse unit types based on an a priori determi-
nation of the major communicative functions that discourse units can serve in
these texts. That framework is then applied to the analysis of all texts in a corpus.
Thus, when texts are segmented into discourse units, it is done by identifying a
stretch of discourse of a particular type; that is, that serves a particular communi-
cative function.
In contrast, in the bottom-up approach, the first step is to automatically seg-
ment all texts in the corpus into discourse units (based on linguistic criteria).
Those discourse units are then analyzed for many other linguistic features, and
grouped into clusters of discourse units that are linguistically similar. Only then
after the discourse units have already been grouped linguistically are those
groupings interpreted as discourse unit types, by determining their typical func-
tions in texts.
Table 1.2 Bottom-up corpus-based analyses of discourse organization
Required step in the analysis Realization in this approach
1. Segmentation Segment each text in the corpus into discourse

units, based on shifts in vocabulary or other lin-
guistic features
2. Linguistic analysis of each unit Analyze the full range of lexical / grammatical
characteristics of each discourse unit in each text of
the corpus
3. Classification Identify the set of discourse units types that emerge
from the corpus analysis, based on linguistic crite-
ria; that is, group all discourse units in the corpus
into linguistically-defined categories or types
4. Linguistic description of discourse Describe the typical linguistic characteristics of
categories each discourse category, based on analysis of all
discourse units of a particular type in the corpus
5. Communicative/functional categories Describe the functional bases of each discourse cat-
egory, based on post-hoc analysis of the discourse
units identified as belonging to a particular type
6. Text structure Analyze complete texts as sequences of discourse
units shifting among the different functional types
7. Discourse organizational tendencies Describe the general patterns of discourse organi-
zation across all texts in the corpus
3.1 Examples of top-down analyses of discourse
Several top-level discourse structure theories were advanced by text linguists in

the 1980s and 1990s. Theories of superstructures were developed for different
types of texts such as exposition, argumentation, and narration. These superstruc-
tures of texts were called macrostructures by Van Dijk (1980), problem-solution
patterns by Hoey (1983; 1986), superstructures of arguments by Tirkkonen-Con-
dit (1985), and story grammars by Mandler and Johnson (1977).
Story grammar analysis had its start in the work of Labov and Waletsky (1967),
who proposed the following structure for analyzing oral narratives: orientation
(the major characters are introduced and a setting is established); complication (a
series of events unfold, and a crisis develops); resolution (the crisis is solved); and
coda (the final stage, in which the writer may express an attitude toward the story
or give her perspective on its significance). Although developed for oral texts orig-
inally, the story grammar analysis became a popular tool in written discourse
analysis. Martin and Rothery (1986) used it effectively as a research and teaching
method for school writing in Australia.
There are other approaches to the analysis of text structure that could be clas-
sified as being top-down in nature. Mann and Thompson (1992) in their book,
Discourse Description: Diverse Linguistic Analyses of a Fund-raising Text, showcase
seven different methods for looking at the text organization of a single fundraising
letter. One, described by Callow and Callow (1992), is somewhat like the appeals
analysis described below, except that the focus is on identifying the kinds of in-
tended meanings (rather than appeals) that reflect the writers purposes. These dif-
ferent meaning purposes (e.g., informative, expressive, and conative [expressing
desires and intentions]) can be used to analyze the meaning-based structure of the
text. In their chapter of the book, Mann, Matthiessen, & Thompson (1992) use
Rhetorical Structure Theory (RST) to analyze the relational structure of a text. At
its most basic level, RST identifies coherence in a text that is, how different parts
of a text relate to each other, or more specifically how one part of a text supports,
elaborates, provides background for, offers contrast to, justifies, etc, another part of
the text. By looking at these relationships, the rhetorical structure of the texts in a
corpus could also be mapped out (see also Fox, 1987, Chapters 45).
Connor (1996) pointed out that the above kinds of analyses provided a new
development in written discourse analysis. Researchers became keenly aware that
different textual modes (e.g., narration, exposition, argumentation) used different
discourse structures. Unlike the study of cohesion, for example, the analysis of
super structures was specific to a text type. The increased interest in specific genres
has further stimulated research on discourse structures of texts.
Move analysis (Swales, 1981, 1990) is an example of such a specific genre anal-
ysis. Move analysis was developed as a top-down approach to analyze the discourse
structure of texts from a genre; the text is described as a sequence of moves, where
each move represents a stretch of text serving a particular communicative function.
The analysis begins with the development of an analytical framework, identifying
and describing the move types that can occur in this genre: these are the functional/
communicative distinctions that moves can serve in the target genre.
Subsequently, selected texts are segmented into moves, noting the move type
of each move. The overall discourse structure of a text can be described in relation
to the sequence of move types. For example, a research article might begin with a
move that identifies the topic and reviews previous research, followed by a move
that identifies a gap in previous research, followed by a move that outlines the
goals of the present study, summarizes the major findings, and outlines the or-
ganization of the paper.
Until recently, top-down approaches (including move analysis) have not been
applied to an entire corpus of texts, because it is highly labor-intensive to apply a
top-down analytical framework to a large corpus of texts. However, this invest-
ment of labor pays off by enabling generalizable analyses of discourse structure
across a representative sample of texts from a genre. For example, once a corpus of
texts has been coded for moves, we can easily analyze the typical linguistic (lexical
and grammatical) characteristics of each move type. It is then possible to identify
the sequences of move types that are typical for a genre, and against that back-
ground, it is also possible to identify particular texts that use more innovative se-
quences of move types. In summary, corpus-based move analyses illustrate the top
down approach: the functional analytical framework is developed first; that frame-
work is then applied to segment texts into discourse units (moves); and finally the
moves and functional move types are analyzed to describe their linguistic charac-
teristics. Chapters 34 in the present book illustrate this general approach to dis-
course structure.
Rhetorical appeals analysis is another top-down approach (see Chapter 5). In-
stead of describing texts according to their communicative functions (moves),
rhetorical appeals analysis divides texts into sections using the three basic means
of Aristotelian persuasion: ethos, pathos, and logos. Similar to move analysis, this
approach begins with the development of an analytical framework, identifying
and defining the appeal types. The texts in a corpus are then analyzed by applying
this analytical framework: segmenting texts into appeals, noting the appeal type of
each appeal.
In practice, most previous discourse analyses have been top-down. However,
there have been few previous top-down studies of discourse applied to an entire
corpus of texts, in large part because the analyses are so labor-intensive. In the
present book, we illustrate two particular top-down approaches to discourse: move
analysis (Chapters 34) and rhetorical appeals analysis (Chapter 5).
3.2 Example of bottom-up approach
In contrast to the long research tradition applying top-down analyses of discourse,

the bottom-up approach was only recently developed, specifically for corpus-
based analyses of discourse structure. This approach has not been previously prac-
ticed by discourse analysts because it requires advanced computational techniques
and does not make sense for the analysis of an individual text. That is, a discourse
analyst traditionally begins by considering the communicative-functional context
of a text, and relies on those considerations to identify the components of the text,
and how a text is organized in those terms.
In contrast, the bottom-up approach was developed to address the methodo-
logical problem of how discourse patterns could be analyzed in a large corpus, with
hundreds or thousands of texts. In theory, top-down analyses can also be applied to
large text corpora, but in practice, such analyses are limited by the human resourc-
es that are available for manually coding discourse units in texts. The bottom-up
approach has no such limitations, because it incorporates automatic computational

techniques which can be easily applied to the analysis of hundreds of texts.
Vocabulary-Based Discourse Unit (VBDU) analysis is the specific bottom-up
approach illustrated in the present book. (See Chapter 6 for a detailed descrip-
tion.) The first step is to automatically segment texts into discourse units the
VBDUs. This is done using computational techniques, based on vocabulary repeti-
tion. At this stage, we know nothing about the underlying types of discourse units
or the communicative functions served by these types. Then, in the second step,
we undertake comprehensive linguistic descriptions of each VBDU (again utiliz-
ing automatic computational techniques). These linguistic descriptions are used to
group VBDUs into categories, so that all the VBDUs in a grouping are similar lin-
guistically. At that point, functional considerations become important, because
the linguistic groupings of VBDUs are interpreted as functional VBDU-types. That
is, each type represents a grouping of VBDUs that are similar in their lexico-
grammatical characteristics, and those groupings are interpreted to identify their
typical discourse meanings and functions. Finally, the overall discourse organiza-
tion of texts is described as sequences of VBDUs, noting the functional discourse
type of each VBDU.
One major difference between the two approaches is the role of the functional
versus linguistic analyses. In the top-down approach, the functional framework is
primary. Thus, the first step in the analysis is to determine the possible discourse
unit types (e.g., move types) and provide an operational definition for each one.
This functional framework is then used to segment texts into discourse units. Lin-
guistic analysis is secondary in a top-down approach, serving an interpretive role
to investigate the extent to which functionally-defined discourse units also have
systematic linguistic characteristics.
In contrast, the linguistic description is primary in the bottom-up approach.
Texts are automatically segmented into VBDUs based on vocabulary patterns, and
then VBDUs are grouped into categories based on the use of a wide range of lexi-
co-grammatical features. Functional analysis is secondary in VBDU analysis, serv-
ing an interpretive role to investigate the extent to which linguistically-defined
discourse unit categories also have systematic functional characteristics.
4 Creating a specialized corpus for discourse analysis
One of the central methodological issues for corpus-based research is to ensure

that the corpus chosen for analysis actually represents the discourse domain being
studied and is thus suitable for the research questions being investigated (see Biber
1993, 2004). This is of course no different than any other quantitative research in
the social sciences, where there is always concern that the sample being studied
actually represents the larger target population (one of the potential threats to
external validity).
Corpus-based studies of discourse structure are potentially problematic in this
regard for two related reasons:
1. Corpora are often designed for general use rather than a specific study. As a
result, the population being represented can be relatively general, such as
newspaper language, or even an entire language.
2. Researchers sometimes choose to use a corpus just because it is publicly avail-
able, with little consideration of whether that corpus actually represents the
target population being investigated.
However, these problems can be readily addressed. Most corpora have been de-
signed with relatively well-specified sub-corpora that represent particular text cat-
egories, such as academic research articles, newspaper editorials, or face-to-face
conversation. When corpus studies have been based on particular sub-corpora,
the findings have been much more interpretable. In addition, many recent corpora
have been designed for more particular research purposes. For example, the T2K-
SWAL Corpus a relatively general corpus was designed to represent the range
of spoken and written genres used in American universities (including sub-corpo-
ra for office hours, study groups, textbooks, course syllabi, etc.; see Biber 2006b).
The ICIC Fundraising Corpus is somewhat more specialized, designed to repre-
sent American fundraising discourse, including sub-corpora for genres like direct
mail letters and grant proposals (see Connor & Upton, 2004a, 2004b; Upton, 2002;
Upton & Connor, 2001).
In general, more specialized corpora are more appropriate for the study of dis-
course structure. The corpora used in the present book are all relatively specialized,
but they differ in the extent to which they represent a narrowly-defined genre. At
one extreme, the study reported in Chapter 4 of the present book is based on a
highly restricted corpus of research articles published in biochemistry academic
journals. Prior research was carried out to identify the five most prestigious aca-
demic journals in this discipline, and then research articles were collected over a
12-month period from those journals. The study reported in Chapter 7 is based on
a corpus of research articles published in biology academic journals, but it deliber-
ately includes a range of sub-disciplines in the sample. The study in Chapter 3 is
based on analysis of the direct mail letters included in the ICIC Fundraising Cor-
pus; these include letters from a wide variety of non-profit organizations across a
wide variety of non-profit fields (e.g., health and human services, education). Fi-
nally, the corpus used in Chapter 8 is probably the least specialized, consisting of
transcripts from university-level classroom teaching sessions collected across sev-
eral different academic disciplines. However, all corpora used here are relatively
specialized, restricted to particular genres. Such corpora are required for corpus-
based studies of discourse structure: each text has its own discourse organization,
and it is reasonable to hypothesize that all texts from a genre will tend to share
similar patterns of discourse organization. Our goals in the present book are rela-
tively straightforward: we hope to analyze corpora that represent particular genres,
to describe the patterns of discourse organization in those genres and to investigate
empirically the variation in discourse patterns across texts within a single genre.
5 Overview of the book
The book is organized into two parts, corresponding to the two major corpus-based
approaches to discourse organization introduced in Section 2 above. Part I of the
book focuses on Top-down analyses of discourse organization. Chapter 2 intro-
duces top-down analysis in greater detail, describing the analytical procedures re-
quired for these analyses, with a special focus on genre-based move analysis and
the methodological issues that arise during the application of this approach to the
analysis of a corpus of texts. Part I of the book then presents three case studies il-
lustrating the top-down approach. The first case study (Chapter 3) describes how
fundraising letters are structured in terms of rhetorical moves, focusing on the lin-
guistic expression of stance in the different move types. The second case study
(Chapter 4) describes the typical discourse organizations of biochemistry research
articles, again using move analysis as the primary analytical framework. Rather
than focusing on a restricted set of linguistic features, this second case study under-
takes a multi-dimensional analysis (see Appendix One) to describe the typical lin-
guistic characteristics of move types in this genre with respect to a wide range of
lexical and grammatical features. Finally, the last chapter in Part I of the book in-
troduces a second top-down approach to discourse structure: appeals analysis.
This approach is applied to the same corpus of fundraising letters as in Chapter 3,
allowing a direct comparison of these two analytical approaches.
Part II of the book Bottom-up analyses of discourse organization then
deals primarily with Vocabulary-Based Discourse Unit (VBDU) analysis. Chapter
6 introduces this analytical framework in detail, describing both the analytical
procedures and experimental research that explores the extent to which the auto-
matically-identified VBDUs correspond to discourse units recognized on a per-
ceptual basis by human raters. Two case studies based on this approach are then
presented: Chapter 7 presents a bottom-up analysis of a corpus of biology research
articles, describing how texts from this genre are structured as sequences of VB-
DUs; Chapter 8 presents a similar analysis of VBDUs in university classroom
teaching sessions. Finally, the concluding chapter (9) provides a synopsis of find-
ings, a more theoretical discussion of the strengths and weaknesses of each ap-
proach, and a discussion of future prospects for investigations of this type.
Part 1
Top-down analyses of discourse organization

chapter 2
Introduction to move analysis
WITH Budsaba Kanoksilapatham
In Chapter 1, we introduced two different approaches for using corpora to

analyze discourse organization: top-down and bottom-up corpus-based analyses.
This chapter focuses primarily on one type of top-down approach: move
analysis. We give a detailed description of move analysis including what it is,
what this type of analysis tells you, examples of studies using move analysis, steps
to conducting a move analysis, and special considerations for and advantages of
using a corpus-based approach. As noted in the previous chapter, there are many
top-down approaches to discourse analysis, like the appeals analysis described in
Chapter 5; move analysis, however, is the approach that has been most frequently
used to date in corpus-based studies. Chapters 3 and 4 provide specific examples
of these kinds of studies. The intent of the present chapter is to introduce the
goals and methods of corpus-based move analysis (as one common type of
top-down discourse analysis), in order to show how generalizable corpus-based
descriptions of discourse organizational patterns can be achieved using a top-
down approach.
1 Background
Genre analysis using rhetorical moves was originally developed by Swales (1981) to
describe the rhetorical organizational patterns of research articles. Its goal is to
describe the communicative purposes of a text by categorizing the various dis-
course units within the text according to their communicative purposes or rhetori-
cal moves. A move thus refers to a section of a text that performs a specific com-
municative function. Each move not only has its own purpose but also contributes
to the overall communicative purposes of the genre. In Swales words, these pur-
poses together constitute the rationale for the genre, which in turn shapes the
schematic structure of the discourse and influences and constrains choice of con-
tent and style, with texts in a genre exhibiting various patterns of similarity in
terms of structure, style, content and intended audience (1990, p. 58).
Genre analysis was developed in the 1970s and 1980s as part of the wider
growth of discourse analyses focusing on the organization of discourse. Bhatia
(2004) documents how structural concerns, for example Hoeys (1983) problem-
solution structure analysis, directed the analysts attention away from studying
lexico-grammatical features of texts (e.g., passives and nominalizations, use of
tenses, coherence). Researchers involved in the analysis of text as genre further
related discourse structures to the communicative functions of texts, resulting in
the current approach of doing genre analysis using rhetorical moves.
In genre analysis, the purposes of the genre are recognized by the expert mem-
bers of the discourse community, less so by the novice members, and probably not
by the nonmembers. These purposes shape the rationale, and the rationale helps
develop the constraining conventions. According to Swales (1990), these conven-
tions are constantly changing but still exert influence. As we will see in later chap-
ters, discourse communities are powerful in shaping the conventions of the genre.
Research papers in scholarly disciplines are good examples of such discourse com-
munities where novice writers are indoctrinated into the paper-writing genre in
their graduate studies and young publishing lives. There are genres, however,
which are not shaped by such strong discourse community rationales. Take fund-
raising letters as an example. It is fair to say that both writers and readers recognize
a fundraising letter as such. However, since readers and potential donors do not
typically write them, conventions may not be so strictly adhered to. In fact, devi-
ance from conventions may seem fresh to the reader who may receive hundreds of
them a year but does not need to worry about writing any.
In move analysis, the general organizational patterns of texts are typically de-
scribed as consisting of a series of moves, with moves being functional units in a
text which together fulfill the overall communicative purpose of the genre (Con-
nor, Davis, & De Rycker, 1995). Moves can vary in length, but normally contain at
least one proposition (Connor & Mauranen, 1999). Some move types occur more
frequently than others in a genre and can be described as conventional, whereas
other moves occurring not as frequently can be described as optional.Moves may
contain multiple elements that together, or in some combination, realize the move.
These elements are referred to as steps by Swales (1990) or strategies by Bhatia
(1993a). The steps of a move primarily function to achieve the purpose of the
move to which it belongs (see, e.g., Crookes, 1986; Dudley-Evans, 1994a; Hopkins
& Dudley-Evans, 1988; Swales, 1981, 1984, 1990). In short, moves represent se-
mantic and functional units of texts that have specific communicative purposes; in
addition, as the following sections show, moves generally have distinct linguistic
boundaries that can be objectively analyzed.
Chapter 2. Introduction to move analysis
2 Swales move analysis of research articles
Swales (1981) developed the discourse approach of move analysis within the more
general field of English for Specific Purposes (ESP). This approach has been revised
and extended by several scholars, including Swales (1990). The original aim of Swales
work on move analysis was to address the needs of advanced non-native English
speakers (NNSs) learning to read and write research articles, as well as to help NNS
professionals who want to publish their articles in English. His analysis of 48 intro-
duction sections in research articles from a range of disciplines (physics, medicine,
and social sciences), written in English, led Swales to propose a series of moves i.e.,
specific communicative functions performed by specific sections of the introductions
that defined the rhetorical structure of research article introductions.
A closer examination of Swales move structure, or framework, for these intro-
ductions helps elucidate the interaction between moves and steps in performing
communicative functions in scientific texts. Swales three-move schema for article
introductions, collectively known as the Create a Research Space (CARS) model,
is presented in Table 2.1. The model shows the preferred sequences of move types
and steps, which are largely predictable in research article introductions.
Table 2.1 CARS model for research article introductions, adapted from Swales
(1990, p. 141)
Move 1: Establishing a territory

Step 1 Claiming centrality and/or
Step 2 Making topic generalization(s) and/or
Step 3 Reviewing items of previous research
Move 2: Establishing a niche
Step 1A Counter-claiming or
Step 1B Indicating a gap or
Step 1C Question raising or
Step 1D Continuing a tradition
Move 3: Occupying the niche
Step 1A Outlining purposes or

Step 1B Announcing present research
Step 2 Announcing principal findings
Step 3 Indicating RA structure
Swales model includes three basic move types in research article introductions.
Move 1 Establishing a territory introduces the general topic of research. Move 2
Establishing a niche identifies the more specific areas of research that require
further investigation. And Move 3 Occupying a niche introduces the current
research study in the context of the previous research described in Moves 1 and 2.
Move 1 can have a maximum of three steps (Step 1, Step 2, and Step 3). In
Move 1, Step 1, Claiming centrality, the author can make a centrality claim by
claiming interest or importance in referring to the classic, favorite or central per-
spective, or by claiming that there are many investigators in the area. This step is
usually, but not always, at the beginning of the introduction. To illustrate Move 1,
Step 1, Swales (1990) presents the following examples:
The study ofhas become an important aspect of
A central issue inis the validity of (Swales, 1990, p. 144)
Move 1, Step 2, Making topic generalizations, represents a neutral kind of general
statement. It usually takes the form of either statements about knowledge or prac-
tice, or statements about phenomena. Usually, this step seeks to establish territory
by emphasizing the frequency and complexity of the data. Some examples of Move
1, Step 2 are:
The aetiology and pathology is well known.
A standard procedure for assessing has been
There are many situations where (Swales, 1990, p. 146)
The last step of this move, Step 3, Reviewing items of previous literature, is where
the author reviews selected relevant groups of previous research. Here, the author
specifies the important findings of the study and situates his/her own current re-
search study. Examples of Move 1, Step 3 are:
X Was found by Sang et al.(1972) to be impaired.
Chomskyan grammarians have recently (Swales, 1990, p. 150)
In establishing territory, then, the author convinces the readers about the importance
of the area of study by making strong claims with reference to previously published
research, which can be done in three ways, as indicated by the three step options.
Move 2 of the CARS model, Establishing a niche for about-to-be presented re-
search, is considered a key move in research article introductions because it con-
nects Move 1 to Move 3, by articulating the need for the research that is being
presented. Move 2 is manifested in one of four ways: Step 1A, Counter claiming;
Step 1B, Indicating a gap, Step 1C, Question raising, and Step 1D, Continuing a
tradition. The four options for realizing Move 2 are represented by the following
examples, taken from Swales, 1990, p. 154:
Step 1A, Counter Claiming Emphasis has been on, with scant
attention given to
Step 1B, Indicating a Gap The first group...cannot treat and is
limited to
Step 1C, Question Raising Both suffer from the dependency on
Step 1D, Continuing a Tradition A question remains whether
The final move type that Swales proposed for research article introductions is
Move 3, Occupying the niche. As noted earlier, Move 1 reports on the centrality of
the research topic or generalizations about previous research. Move 2 expresses
the authors own opinions about the need for the current research (with reference
to the past literature). Importantly, Move 3 is distinct from the other two moves in
the Introduction in that the authors assume a more active role in the research con-
ducted, rather than just referring to previous studies or asserting the need for this
one. In fact, Move 3 is the only place in the research article introduction where the
authors express and enjoy their own accomplishment, pride, and commitment
(Swales, 1990). Move 3 introduces new research by first either Stating research
purpose(s) (Step 1A) or Describing the main features of the research (Step 1B), then
by Announcing the principal findings (Step 2), and then finally by Indicating the
research article structure (Step 3). Examples illustrating the steps of Move 3, taken
from Swales (1990, p. 160) are:
Step 1A, Outlining Purpose The aim of the present paper is to

give
Step 1B, Announcing Present This study was designed to evaluate
Research
Step 2, Announcing Principal The paper utilizes the notion of
Findings
Step 3, Indicating Research Article This paper is structured as follows
Structure
Swales CARS model for academic research articles has been widely studied and
validated since it was first published in 1990. The model has been shown to have a
recursive nature what Swales has called recycling (1990, p. 140) with moves
or steps occurring more than once as well as with varied realizations in research
writing across contexts. For example, Bunton (2002) has shown that the genre of
Ph.D. theses introductions, while having the same general CARS structure pro-
posed by Swales, has some alternate ways for realizing the three basic moves. One
example of this is in Move 1, Establishing a Territory; Bunton proposes that a new
step Defining terms plays an important part in fulfilling the function of helping to
establish the territory to be covered in Ph.D. thesis introductions, while this is not
the case for research article introductions.
Indeed, subsequent research on the introduction section of research articles in
other disciplines (see discussion below) has helped us recognize how different dis-
ciplines manipulate a common genre in this case, research articles to meet
their own communicative needs. Our understanding of one small section of aca-
demic research articles Introductions has evolved from a one size fits all
perspective to a more subtle, discipline-specific understanding of the rhetorical
purposes and expectations of research articles. Swales (2004), in response to this
subsequent research, modified his model to better reflect the variability in how the
three move types are realized in different sub-genres of research article introduc-
tions. His revised model, shown in Table 2.2, has a broader description of the com-
municative purposes of Move 1 and Move 2; it also reflects particularly in Move
3 the variation that occurs in introductions in different research fields, and rec-
ognizes the possibility of cyclical patterns of occurrence of the move types (de-
scribed further below) within the introduction section.
Table 2.2 Swales revised model for research article Introductions (2004, pp. 230, 232)
Move 1: Establishing a territory (citations required) via Topic generalizations of

increasing specificity
Move 2: Establishing a niche (citations possible) via:
Step 1A: Indicating a gap, or

Step 1B: Adding to what is known
Step 2: Presenting positive justification (optional)
Move 3: Presenting the present work via:
Step 1: Announcing present research descriptively and/or purposively

(obligatory)
Step 2: Presenting research questions or hypotheses* (optional)
Step 3: Definitional clarifications* (optional)
Step 4: Summarizing methods* (optional)
Step 5: Announcing principal outcomes (optional)**
Step 6: Stating the value of the present research (optional)**
Step 7: Outlining the structure of the paper (optional)**
* Steps 24 are less fixed in their order of occurrence than the others.
** Steps 57 are probable in some fields, but unlikely in others.
The key point here is that while related genres will certainly share common move
types, each will have their own unique structural characteristics that reflect the
specific communicative functions that the genres have.
3 Move analysis of research articles applied across genres
3.1 Description and examples
While move analysis was originally developed as a tool to teach non-native speak-
ers the rhetorical structures of research articles, Swales framework has been suc-
cessfully extended to other areas of English for Specific Purposes (ESP) instruc-
tion, including English for Business and Technology (Bhatia, 1993a, 1997a) and
English for Professional Communication (Flowerdew, 1993). Swales framework
of move analysis has stimulated substantial research on the rhetorical structures of
academic and professional texts. In academic writing, it has been applied to aca-
demic disciplines including biochemistry (Kanoksilapatham, 2005; D. Thompson,
1993), biology (Samraj, 2002), computer science (Posteguillo, 1999), and medicine
(Nwogu, 1997; Williams, 1999), as well as on a variety of academic genres, includ-
ing university lectures (S. Thompson, 1994), master of science dissertations (Hop-
kins & Dudley-Evans, 1988), and textbooks (Nwogu, 1991).
Within the genre of scientific research articles the original focus of move anal-
ysis a number of move-based studies have focused on specific sections of research
articles. For example, Crookes (1986) compared Introduction sections of research
articles across a variety of fields; Wood (1982) described the moves of Methods sec-
tions in chemistry articles; Thompson (D. Thompson, 1993) and Williams (1999)
focused on the moves of Results sections in biochemistry and medical research ar-
ticles respectively; and Peng (1987) looked at the moves used in the Discussion sec-
tion of chemical engineering research articles. Posteguillo (1999) computer sci-
ence and Nwogu (1997) medicine both went a step further and explored the
use of moves across multiple sections within the genres they investigated, and
Kanoksilapatham (2005) has investigated the move structure of complete biochem-
istry research articles. A more detailed description of how move analysis was used
to describe the structure, and linguistic features, of entire biochemistry research
articles is provided by Kanoksilapatham in Chapter 4 of this book.
More recently, professional discourse has also been examined through the lens
of move analysis, including legal discourse (Bhatia, 1993b), philanthropic dis-
course focusing on direct mail letters (Upton, 2002; Upton & Connor, 2001) and
grant proposals (Connor, 2000; Connor & Mauranen, 1999; Connor & Upton,
2004a) and movie reviews (Pang, 2002).
A brief description of a move analysis done on a corpus of job application let-

ters (Connor, Precht, & Upton, 2002) provides an interesting illustration of how
different genres can have quite different move types. The letters in this study were
from the Indianapolis Business Learner Corpus (IBLC), which included job ap-
plication letters written by business students at U.S., Belgian, and Finnish universi-
ties between 19901998. The 99 letters in the corpus were generated by students
(all either business and/or English majors) as part of a common class assignment.
Applying Swales approach to analyze the genre of job application letters, the
following move types were identified:
Move 1: Identify the source of information. (Explain how and where you
learned of the position.)
I recently received word from Blockbuster Recruiting about a man-
agement position available at your company.
Move 2: Apply for the position. (State desire for consideration.)
I am very interested in a temporary job working as a European busi-
ness student intern in the U.S.A.
Move 3: Provide arguments for the job application.
Step 1: Implicit arguments based on neutral evidence or informa-
tion about background and experience. In providing sup-
porting information or arguments, the writers simply list their
background experience.
I received my Associates Degree in General Studies in May
1993. Previously I have received a degree in Office Manage-
ment from Indiana Business College and I have obtained the
Certified Professional Secretary (CPS) certification.
Step 2: Arguments based on what would be good for the hiring com-
pany. In this step, the writer argues explicitly that their experi-
ence or education will benefit the company that hires them.
My intercultural training will be an asset to your internation-
al negotiations team.
Step 3: Arguments based on what would be good for the applicant.
In this step, the writer argues how the position would in fact
be beneficial to him/herself.
The opportunity to study abroad the globalised business en-
vironment would help me gain the knowledge and experience
to grow in the changing business world of today.
Move 4: Indicate desire for an interview or a desire for further contact.
I hope I got you interested so that I will be selected for an inter-
view.
Im always prepared to participate in an interview.
Move 5: Express pleasantries or appreciation at the end of the letter.

Thank you in advance for your consideration.
Thank you for your time in reviewing this material.
Move 6: Offer to provide more information.
I will be happy to provide you with any additional information that
you may need.
Move 7: Reference attached resume.
I have enclosed my resume...
A resume is enclosed.
The most obvious difference between the move structure of research article intro-
ductions and the move structure of letters of application is that the former has only
three major move types and the latter has seven. This is all the more interesting to
note because research article introductions (with only three major moves) are
typically much longer than letters of application. Three other important points are
illustrated by comparing these two move structures. The first is the fact that moves
are identified by the communicative purpose that the writer is seeking to accom-
plish, whether that be done in one sentence or five paragraphs. Consequently,
moves can be quite variable in length. The second point is that some genres have a
fairly simple move structure, with only three or four basic communicative func-
tions, while other genres may have a fairly complex move structure, with many
different communicative functions. The third point is that while some moves may
be realized through two or more different steps, other moves may only be ex-
pressed in one general functional-semantic way (e.g., Swales Move 1 has three
steps, while Connor et al.s Move 1 has no step options).
There are two additional characteristics of moves that should be noted. The
first is that some move types in a genre may be more common (or obligatory),
while other moves may be optional.Lewin, Fine, and Young (2001) and Bhatia
(1993a) are among those that underscore this characteristic of moves. Bhatia pre-
fers the term strategy as opposed to step, to reflect the variability among elements
within a move: move elements may or may not regularly appear, and they can be
used in different sequential order. In Chapter 3 of this volume, for example, we
describe the variable move structure of direct mail letters; some of the move types
in this genre are clearly optional, and there is a fairly free ordering of the moves
within a given text. Similarly, Kwan (2006) shows that the third move (Occupying
the niche) is optional in the literature review of Ph.D. theses of applied linguistics.
In addition, it is possible that some move types will recur in a cyclical fashion
within a section of text (Swales, 2004). Typically, the cyclical reoccurrence of a
move within a section of text has been dealt with by considering each appearance
of a particular move as a separate occurrence. For example, if a text starts with, say,
Move Type 1, continues with Move Type 2, and then returns to Move Type 1, Move
Type 1 would be counted as having occurred twice. The studies in Chapters 3 and
4 both used this approach to identify and count moves. More rarely, moves can be
interrupted by or have inserted into them another move type (Upton, 2002).
While this is rather unusual, there can be clear instances where one communica-
tive functional unit (move type) of a text interrupts, often as an aside or a tangen-
tial comment, another very different communicative functional unit of text. The
study described in Chapter 3 provides an example of this. These cyclical and em-
bedded patterns of move types tend to occur mainly in genres that are less con-
strained and allow more variability than those that are more prescribed.
3.2 Summary of previous research on move analysis
To highlight key points introduced above, move analysis proposes that genres are
composed of definable and, to a great extent, predictable functional components
that is, moves of certain types. For example, article introductions typically have
three rhetorical move types establishing territory, establishing a niche, and oc-
cupying the niche. Letters of application have seven distinguishable move types as
described above.
According to Bhatia (1993a), the move structuring of a genre is the property
of the genre itself, not something that the reader constructs. This structure is con-
trolled by the communicative purpose(s) of the text, and is the underlying reason
that one genre varies from another. The moves of a genre are considered such an
inherent part of the genre that they can be used as the building blocks for teaching
novice writers how to successfully write texts in that genre (Dudley-Evans, 1995),
which, as already noted, was Swales initial motivation for exploring the structure
of research article introductions.
4 Overview of the methods for move analysis
4.1 General steps of a move analysis
Kwan (2006) provides a useful introduction to the functional-semantic methods

used for identifying discourse moves. A functional approach to text analysis calls
for cognitive judgement, rather than a reliance on linguistic criteria, to identify the
intention of a text and the textual boundaries (see also Bhatia, 1993a; Paltridge,
1994). This approach is in line with the theoretical definition of a move; that is,
that each move has a local purpose but also contributes to the overall rhetorical
purpose of the text.
It is important to note that there are no strict rules for doing a move analysis,
nor does every researcher necessarily do each of the steps described below. The
intent here is to simply describe common procedures in doing a move analysis.
First, in order to identify the move categories for a genre, it is important to get a
big-picture understanding of the overall rhetorical purpose of the texts in the
genre. The second step is then to look at the function of each text segment and
evaluate what its local purpose is. This is the most difficult step. Move categories
need to be distinctive. Multiple readings and reflections of the texts are needed
before clear categories emerge.
The third step is to look for any common functional and/or semantic themes
represented by the various text segments that have been identified, especially those
that are in relative proximity to each other or often occur in approximately the
same location in various texts representing the genre. These functional-semantic
themes can then be grouped together, reflecting the various steps (or strategies) of
a broader move type, with each move having its own functional-semantic contri-
bution to the overall rhetorical purpose of the text. Swales proposed the first CARS
move, Establishing a Territory, as it was clear that research article introductions
almost always began with a section that functioned to provide a context for the
study being introduced, whether this was done by claiming the centrality of the
study (Step 1), and/or by making generalizations about the topic being studied
(Step 2), and/or by reviewing items of previous research on the topic (Step 3). Not
all research articles introductions have all the steps, but most have at least one of
them, serving the function of establishing the territory for the study to follow.
When a researcher is ready to segment a particular text into moves, it is best to
begin first with a pilot coding, ideally with at least two coders. Because coders are
seeking to understand the functional-semantic purposes of text segments, coding
must be done by hand. Initial analyses are then discussed and fine-tuned until
there is agreement on the functional and semantic purposes that are being realized
by the text segments, resulting in a protocol of move and step features for the
genre, with clearly defined purposes and examples.
For a corpus-based move analysis, this coding protocol is then applied to the
full set of texts. Inter-rater reliability should be checked to confirm that there is
agreement on what the move types are and how they are realized by text segments
(see Section 4.2 below). At this point, it may be necessary to resolve any discrepan-
cies through further discussion and analysis, and then re-code problematic texts.
It is also not uncommon that additional steps or even move types will be discov-
ered during the analysis of the full set of texts.
As noted earlier, some move structures can prove more complex than the
three-move structure of the CARS model. For example, Bhatia (1998) has noted
that fundraising discourse offers a large variety of creative options (p. 100; see
also Chapter 3). In other words, some genres, especially dynamic and persuasion-
oriented ones like fundraising letters, may have obligatory, typical, and optional
move elements, and move types may not necessarily occur in a fixed order. Never-
theless, a move structure for a genre can still be identified by working through the
general process outlined above. Table 2.3 summarizes the typical move analysis
process as it is done in a corpus-based approach.
Table 2.3 General steps often used to conduct a corpus-based move analysis
Step 1: Determine rhetorical purposes of the genre

Step 2: Determine rhetorical function of each text segment in its local context; identify
the possible move types of the genre
Step 3: Group functional and/or semantic themes that are either in relative proximity to
each other or often occur in similar locations in representative texts. These
reflect the specific steps that can be used to realize a broader move.
Step 4: Conduct pilot-coding to test and fine-tune definitions of move purposes.
Step 5: Develop coding protocol with clear definitions and examples of move types and steps.
Step 6: Code full set of texts, with inter-rater reliability check to confirm that there is clear
understanding of move definitions and how moves/steps are realized in texts.
Step 7: Add any additional steps and/or moves that are revealed in the full analysis.
Step 8: Revise coding protocol to resolve any discrepancies revealed by the inter-rater
reliability check or by newly discovered moves/steps, and re-code problematic
areas.
Step 9: Conduct linguistic analysis of move features and/or other corpus-facilitated analyses.
Step 10: Describe corpus of texts in terms of typical and alternate move structures and
linguistic characteristics
The ten steps outlined in Table 2.3 correspond to the general analytical steps for
top-down analyses listed in Table 1.1 (in Chapter 1). For example, the analytical
step Communicative/Functional Categories in Table 1.1 corresponds to Steps
15 in Table 2.3. The steps Segmentation and Classification from Table 1.1 in
practice occur concurrently in Steps 68. The steps Linguistic analysis of each
unit and Linguistic analysis of discourse categories from Table 1.1 are reflected
in Step 9, and the final step in both Table 1.1 and Table 2.3 are the same.
While the process described here is not the only way to do a corpus-based
move analysis, in the end, the move structure should represent the rhetorical
movement (Swales, 1990, p. 140) of the functional-semantic purposes of the text
segments that make up the genre, and all texts in the corpus must be coded for
these distinctions.
4.2 Inter-rater reliability
For top-down approaches to discourse analysis, the first methodological steps in

the analysis involve human judgements to identify and code the discourse compo-
nents of a text. This kind of analysis requires a detailed coding rubric, which ex-
plicitly defines the discourse components (e.g., the move types and steps). A min-
imal evaluation of this rubric is to determine whether raters can achieve high
inter-rater reliability when they apply the coding scheme. That is, do different
raters understand the coding definitions in the same way, with the result that they
all identify the same discourse components in a text, and they all agree on the clas-
sification of those text segments as move types.
The simplest method of reporting inter-rater reliability is percent agreement.
This statistic merely reflects the number of agreements per total number of coding
decisions, but it does not account for chance agreement among raters. A more
common statistic for determining inter-rater reliability is Cohens kappa (k). Co-
hens kappa is a chance-corrected measure of inter-rater reliability that assumes
two or more raters, n cases, and m mutually exclusive and exhaustive nominal
categories (Capozzoli, McSweeney, & Sinha, 1999).
Training is generally done to achieve better and more consistent inter-rater
reliability, but more importantly, training encourages evaluators to examine the
definitions in the coding rubric, and to arrive at a more explicit description of
what each coding category represents. Inter-rater reliability should not be con-
fused with objectivity or validity; it is rather just a measure of consistency and
agreement. As noted by Raymond (1982), the degree to which inter-rater reliabil-
ity is desirable varies with what is being evaluated: It would be possible to achieve
near perfect inter-rater reliability by simply counting the number of words pro-
duced; but no one would seriously accept this as a measure of quality [of writ-
ing]. Because the quality of writing resides not entirely in the text, but in the inter-
actions among the text, its author, and its individual readers, we should not only
expect but actually demand a reasonable amount of variation among raters, with
an inter-rater reliability of.80 being acceptable (p. 401).
Much the same can be said about identifying move boundaries and coding
move types. Moves, by definition, perform communicative functions within a text,
but raters can differ in their understanding of the purpose of a specific text or por-
tion of a text. Nevertheless, the process of identifying and discussing discrepancies
increases inter-rater reliability among researchers and results in a more usable and
consistently interpreted move framework for a genre.
5 Using a corpus-based approach to move analysis
5.1 Corpus-based move analysis
Much of the previous discussion has focused primarily on describing and discussing
the theory behind and the process of doing a move analysis. Discourse analysis in
general, and move analysis in particular, has typically been a qualitative approach to
analyzing discourse, with studies focusing on only a few texts. This is well illustrated
by the collection edited by Mann and Thompson (1992), which includes twelve dif-
ferent analytical approaches to analyzing the discourse of one single letter.
In contrast, a corpus-based approach requires analysis of a well-designed rep-
resentative collection of texts of a particular genre. These texts are encoded elec-
tronically, allowing for more complex and generalizable research findings, reveal-
ing linguistic patterns and frequency information that would otherwise be too
labor intensive to uncover by hand (Baker, 2006, p. 2). That is not to say that a
corpus-based approach is simply a quantitative approach. Corpus-based discourse
analysis depends on both quantitative and qualitative techniques. Even with a cor-
pus-based approach, the moves and move types in each text must first be identified
and tagged individually by the researchers making qualitative judgments about the
communicative purposes of the different parts of a text. And even once quantita-
tive data are run, the results must still be interpreted functionally. As has been
noted previously, Association patterns represent quantitative relations, measuring
the extent to which features and variants are associated with contextual factors.
However, functional [qualitative] interpretation is also an essential step in any
corpus-based analysis (Biber et al., 1998, p. 4).
To summarize, what makes a corpus-based approach to move analysis differ-
ent from the traditional approach are the following:
a) analyses are done on a relatively large representative collection of texts from a
particular genre;
b) all texts are electronically encoded to allow for computerized counts and cal-
culations using different programs and software packages;
c) once the coding rubric for move types is developed, all texts in the corpus are
coded to identify the moves and code the move types;
d) analysis of the linguistic characteristics of specific move types can be easily
done in order to provide details about how different communicative purposes
are realized linguistically; and
e) in addition to conducting the traditional move analysis, quantitative counts
permit the discussion of general trends, relative frequency of particular move
types, and prototypical and alternate patterns of move type usage (this is dis-
cussed further below).
5.2 General advantages of corpus-based approaches to discourse analysis
There are several advantages to using a corpus-based approach to top-down analy-

ses of discourse (including move analysis and appeals analysis). Baker (2006) in his
book, Using Corpora in Discourse Analysis, outlines four advantages of using cor-
pora to analyze discourse. First, a corpus-based approach helps reduce researcher
bias. All researchers approach their research from a particular worldview; often we
are aware and take account of our biases, but often we are unaware of biases. As
Baker notes, by using a corpus, we at least are able to place a number of restrictions
on our cognitive biases (p. 12); overall patterns and trends are more likely to show
through when we are looking at dozens of texts rather than just one or two selected
texts. In short, corpus-based approaches help put the focus of discourse analysis on
interpretation of the data not the data itself by reducing the opportunity for
manipulation (conscious or unconscious) of the texts selected for analysis.
The second advantage of corpus-based discourse analysis identified by Baker
(2006) addresses what he calls the incremental effect of discourse (p. 13). The
primary purpose of discourse analysis is to understand how language is used, of-
ten in quite subtle ways. A single text on its own is insignificant; however, corpus
analysis allows us to see patterns of words, phrases, structures and/or discourses
that permeate, often contrary to common-sense, our language. A corpus also al-
lows researchers to see patterns that exist but might otherwise miss when analyz-
ing a small sample of texts because they are not overwhelmingly frequent.
The third advantage Baker (2006) gives for using a corpus-based approach to
discourse analysis is that it is much easier to identify counter-examples resistant
discourse on the one hand, and to less readily mistake them for hegemonic
or dominant discourse on the other hand (p. 14). For example, results of a
corpus-based move analysis are much more likely to represent the move and lin-
guistic structures that are in fact typical for the genre as a whole, and much less
likely to be skewed by the random selection and analysis of only a handful of texts
that may turn out to not be representative of the genre as a whole.
Lastly, Baker (2006) suggests that a significant advantage to a corpus-based
approach is that it is easily combined with other methodologies to reinforce and
strengthen the overall analysis, what is often called triangulation. For example,
the approach presented in Chapters 34 of the present book combines move anal-
ysis with analysis of the linguistic characteristics of the move types to describe how
different communicative purposes are linguistically realized.
While these four advantages are relevant for all approaches to discourse analy-
sis, a corpus-based perspective offers distinct advantages to move analysis in par-
ticular, which are described in the next section.
5.3 Specific advantages of a corpus-based perspective for move analysis
5.3.1 Identifying linguistic features of moves

While one could do a move analysis of a single text, it only becomes possible to
describe the typical linguistic characteristics of move types through a corpus-
based approach. Before computerized analysis, there were attempts to summarize
the occurrence of linguistic features in genre moves. For example, Swales (1990,
pp. 131132) summarized the findings of 40 published studies which described
the use of linguistic features in the four major sections of research articles. He
concluded that five linguistic features that verb complement, present tense, past
tense, passive voice, and authors comments or hedging co-occur in particular
patterns to convey particular rhetorical functions. The patterns observed, based
on the five linguistic features, provide evidence for a two-way distinction between
Introduction/Discussion and Methods/Results sections. The Introduction and
Discussion sections have the functions, respectively, of providing the background
of the current study and interpretation of the results. The features frequently found
to be associated with these functions are that complements, present tense, and
authors comments. The Methods/Results sections, respectively, provide informa-
tion regarding experimental procedures and present findings of the current study.
Associated with these functions are a high use of past tense and a variable use of
passive voice verb forms.
The studies cited by Swales usually analyzed selected linguistic features by
hand, looking for patterns and differences. With computers, much more interest-
ing and comprehensive linguistic analyses can be undertaken. Analyses which
take into account only individual linguistic features will reveal very little about the
co-occurrence of linguistic features and how features interact with each other in a
move to perform a particular communicative purpose. It would be more informa-
tive and useful to study the distribution and co-occurrence of many features of
language at once, rather than considering the distribution and function of indi-
vidual features singly. Computer driven, corpus-based approaches allow us to do
this. Chapters 3, 4, & 5 in this volume provide examples of how various linguistic
structures work together in unique combinations to help realize the rhetorical
purposes of the different moves identified for each genre.
It needs to be remembered that move types, and their component steps, are
identified by the functional and semantic purposes that they have. Nevertheless,
because different moves have different functional and semantic purposes, it seems
reasonable to expect that move purposes will be realized through variations in
linguistic features. This is, in fact, what Swales observed in his early analysis of
research articles: The evidence suggests a differential distribution of linguistic
and rhetorical features across the four standard sections of the research article
(1990, p. 136). Consequently, as noted in Chapter 1 of this volume, once texts have
been segmented into moves, it is possible to analyze the linguistic characteristics
of each move to determine the typical linguistic characteristics of the different
move types. This type of analysis has not generally been done in traditional move
analysis studies, and it can be argued that the lack of a description of the typical
lexico-grammtical characteristics of these discourse units (i.e., move types) is a
significant shortcoming of the non-corpus-based approach.
5.3.2 Move frequencies and lengths

Another advantage of the corpus-based approach to move analysis is that it allows
description of the typical distributional and structural characteristics of each move
type. That is, once moves in a corpus have been coded, a variety of descriptive
counts can be made. The most obvious of these are the overall frequency of occur-
rence of each move type in the corpus, and the average length in words of each
move type. Statistics like these allow us to make a clear determination as to wheth-
er a particular move type is obligatory, expected, or merely optional.For example,
in the study described in Chapter 3, the third move type can be considered obliga-
tory, as it occurred in over 97% of the texts, while the first move type is clearly
optional, as it occurred in only about 15% of the texts. If it were not for the corpus-
based approach used to analyze this genre, this optional move might not have even
been identified, because it occurs so infrequently or if it had been identified, its
importance in the genre might have been overstated. Similarly, it is interesting to
note that the third move is, on average, 48 words in length, while the second move,
which occurs in 93% of the texts, is three times longer at 150 words in length. By
identifying this rather large difference in length between these two obligatory
move types, the corpus-based approach invites additional follow-up questions to
explore what the source of this difference might be.
5.3.3 Mapping move use and locations

A computer can be used to count not only the presence of each move type for each
text but also to keep track of their positions relative to each other (e.g., first, sec-
ond, third), what other move types each most commonly co-occur with, how fre-
quently a move is embedded in another move, and how frequently a move occurs
in the body of the text as opposed to, say, a P.S.
The ability to make these sorts of observations permits us to extend our analy-
sis in several ways. For example, it is possible to look at the relationship that differ-
ent move types have with each other. Again, looking ahead to the study described
in Chapter 3, the text position of two of the moves that are identified for the fund-
raising letter genre turns out to be quite predictable: although Move 1 and Move 7
are optional moves, when they are present in a direct mail letter, Move 1 occurs as
the initial move in the letter 97% (34/35) of the time and Move 7 occurs as the final
move before the complementary close 100% (33/33) of the time. The positions of
Move 2 and Move 3 are also highly predictable. If one ignores the presence of
Move 1, Move 2 occurs as the initial move in the direct mail letter 74% (180/242)
of the time. And Move 2, regardless of its position in the letter, is immediately fol-
lowed by Move 3 87% (316/362) of the time.
5.3.4 Genre prototypes

With statistics on move frequencies and lengths, as well as descriptions of where
in the genre a move type tends to occur and how one move type typically relates to
another, a key advantage of a corpus-based approach can be realized: the ability to
develop genre prototypes. Prototypes are particularly valuable in educational and
training contexts to help novices learn to understand and produce a genre that is
new to them. In the study described in Chapter 3, for example, three different
prototypes of the genre are provided. The first includes only the obligatory moves,
the second adds the expected moves, and the third is based on all moves (includ-
ing the optional ones). In these prototypes, not only can the different move types
be included, but typical and alternate locations of moves relative to other moves in
the text can be described. In addition, if linguistic analysis or other follow-up anal-
yses of the individual moves were done, the prototypes can represent these fea-
tures as well. Prototypes such as these are also very useful in understanding better
the genre variation that occurs between different disciplines. For example, Kanok-
silapatham in Chapter 4 shows that the moves in the introduction sections of bio-
chemistry research articles varies somewhat from the CARS model that Swales has
proposed (introduced earlier in this chapter).
6 Summary
This chapter has introduced the top-down approach used most often by applied
linguists for the analysis of discourse structure: move analysis. While discourse
analysis has often been concerned with sentence-level features in writing or gen-
eral modes of writing such as narration, description, and comparison and contrast,
move analysis has given researchers and practitioners useful text-focused tools.
We first discussed the theoretical and empirical underpinnings of traditional
move analysis. We then presented a description of corpus-based move analysis,
with steps that followed the guidelines proposed in Chapter 1 for top-down analy-
ses of discourse structure. The chapter concluded with a discussion of the added
advantages of a corpus-based approach to move analysis. These include the ease of
identifying the linguistic characteristics of the moves, their frequencies and
lengths, and the mapping of their use and location in the overall discourse struc-
ture of texts. Chapters 3 and 4 put this model into practice, presenting corpus-
based move analyses of fund raising letters (Chapter 3) and biochemistry research
articles (Chapter 4).
chapter 3
Identifying and analyzing rhetorical moves

in philanthropic discourse
In Chapter 2, we described the general approach that can be used to identify

and analyze the moves of a genre. In this chapter, we will describe and expand
on a study1 that uses a move analysis to show the rhetorical structure of direct
mail letters, a type of philanthropic discourse. This study illustrates how move
analysis is done and provides a model that can be used for the study of other
genres, especially genres from professional (e.g., business, legal, medical)
contexts. Furthermore, a review of the corpus that was used in this study and
its characteristics will provide a useful example of a specialized corpus that is
essential to conducting a move analysis of a specific discourse genre.
1 Background
Philanthropic discourse fundraising texts like direct mail letters or magazine

advertisements seeks to persuade, inform, request, catch ones eye, wrench ones
heart, and twist ones arm all in a tidy attractive package. The weight upon these
texts is, in fact, enormous. Nonprofit organizations depend to a larger or smaller
extent on fund-raising texts for operating expenses or for funding to accomplish
capital goals. And yet, the various genres of philanthropic discourse have not been
closely studied.
Indeed, Bhatia (1998) claims that the discourse of fundraising represents one
of the most dynamic forms of language use. For a relatively limited number of
communicative functions, this discourse form offers a large variety of creative op-
tions, some rarely used before. It is a category of genre that offers an interesting
and challenging profile of linguistic realizations to achieve a limited set of generic
objectives (Bhatia, 1998, p. 100).
1. This chapter draws on material previously published in the following two articles: (1) Up-
ton, T. A. (2002). Understanding direct mail letters as a genre. International Journal of Corpus
Linguistics 7(1), 6585; and (2) Connor, U. & Upton, T. A. (2003). Linguistic dimensions of di-
rect mail letters. In C. Meyer & P. Leistyna (Eds.), Corpus Analysis: Language Structure and
Language Use (pp. 7186). Amsterdam: Rodopi Publishers.
The dynamic nature of philanthropic discourse is due to the fact that it is de-
signed to be quite persuasive. In short, its primary purpose is to persuade people
to contribute to worthy causes or to underwrite philanthropic programs (Connor,
2000). Because of its persuasive purposes, fundraising has a great deal in common
with promotional materials such as sales letters and job applications, in which the
purpose is to sell something: in sales letters, a service or product; in letters of ap-
plication, a persons abilities; in fundraising, a worthy cause (Bhatia, 1993a; Con-
nor & Wagner, 1998).
Recent studies of philanthropic discourse, specifically fund-raising texts, have
for the most part employed a qualitative approach, analyzing characteristics such
as communicative functions (Bhatia, 1997b; Connor, 1997), rhetorical patterns
(Abelen, Redecker, & Thompson, 1993; Crismore, 1997; Lauer, 1997), social con-
texts (Bazerman, 1997a; Myers, 1997), metaphors (McCagg, 1997), and cultural
differences (Connor & Wagner, 1998; Graves, 1997). Although these studies have
contributed to our understanding of the language of fund raising, the qualitative
nature of these studies left us without an empirical baseline for comparing the
general features of fundraising texts with those of other common texts. Of particu-
lar interest are the types of rhetorical moves that are used to define the different
genres of philanthropic discourse.
What was missing is a corpus-based study of fundraising texts to develop such
a baseline. The Indiana Center for Intercultural Communication (ICIC), with
funding from and in cooperation with the Indiana University Center on Philan-
thropy, undertook a concerted effort to carefully study the language of fundraising
by collecting a large corpus of fundraising material and then studying, among
other things, the rhetorical moves in these genres. The focus of the present chapter
is on the direct mail letters used by non-profit agencies to introduce readers to or
remind them about what the agency does, the clientele/services they are involved
with, and/or the needs that they have that the reader is being asked to assist with
usually financially. Specifically, this study will first investigate the discourse
structure typical of the letters in the corpus, using move analysis, and then provide
a linguistic description of the grammatical stance features that each move most
commonly draws on to accomplish its particular function in the genre.
2 A specialized corpus of fundraising texts
The fundraising letters analyzed in this study are part of the ICIC Fundraising
Corpus, which includes over 900 fundraising documents from 236 organizations
and totals nearly 2 million words. The documents in the corpus include direct mail
letters, newsletters, case statements, grant proposals, and annual reports. Table 3.1
Chapter 3. Identifying and analyzing rhetorical moves in philanthropic discourse
shows the total number of organizations, items and words for each text type in the
corpus.
Table 3.1 ICIC fundraising corpus document types
Type of Text Org. n Item n Word n
Direct Mail Letter 108 316 191,540

Invitation, Newsletter 172 445 922,212
Case Statement 12 13 121,780
Grant Proposal 27 69 156,021
Annual Report 51 84 523,770
Total 370 927 1,915,323

Note: Org. n = the number of organizations represented in this type. Item n = the number of
items of this type in the corpus. Word n = the number of words in the documents of this type in
the corpus.
The present study focuses on the genre of direct mail letters, and thus uses only
that component of the ICIC corpus. Letters were collected from five major types of
organization; Table 3.2 shows the number of organizations, number of letters, and
words broken down by these organization categories.
Table 3.2 ICIC fundraising corpus direct mail letters by organization type
Type of Organization Org. n Item n Word n
Health/Human Services 33 91 54,187

Environmental 10 13 8,126
Community Development 10 17 10,875
Education 27 118 72,583
Arts and Culture 16 63 37,485
Other 12 14 8,284
Total 108 316 191,540
The ICIC Fundraising Corpus was designed to represent a specific type of dis-
course fundraising texts and to represent specific genres within that domain.
The sub-corpus for the genre of direct mail letters was further designed to repre-
sent the range of variation found for this genre. To prevent any skewing of the
corpus towards the writing of any one organization or non-profit field, effort was
made to collect letters from a wide variety of non-profit organizations across a
wide variety of non-profit fields (e.g., health and human services, education).
3 Determining and analyzing discourse moves: Direct mail letters
3.1 Previous analysis of direct mail letters
Direct mail letters for nonprofit fundraising have the general purpose of selling a
product: a good cause. It has been noted (Connor & Upton, 2003) that a whole
industry has developed around direct mail letters in nonprofits, as experts offer
their advice for fundraisers in books and newsletters. It is fair to say, though, that
the advice given in many of these materials often comes from the knowledge base
of mass marketing rather than a careful analysis of the language actually used.
Frequently, a great deal of emphasis is put on the physical appearance of the letter,
while an examination of language use, for the most part, does not appear to be an
important consideration. For example, even though the need for donor segmenta-
tion is frequently recommended, little concrete advice is given about how to ap-
peal to specific audiences.
Linguists interest in the direct mail letter is relatively new. As far as we are
aware, there have only been three research studies published by linguists that focus
on the fundraising direct mail letter. The edited book by Mann and Thompson
(1992) showcased the merits of particular linguistic/rhetorical analyses (such as
the Rhetorical Structure Theory and the topical structure analysis); however, the
purpose of their volume was not necessarily to advance knowledge about the fun-
draising letter as a text type. Abelen, Redeker, and Thompson (1993) offered more
valuable linguistic/rhetorical information about direct mail fundraising letters,
but their focus was a cross-cultural comparison of fundraising letters written by
Dutch and American in one type of non-profit (based on analysis of only 8 letters).
The third article, by Upton (2002), is most relevant here and will be described in
more detail below.
3.2 A move analysis of fundraising letters: Background and methodology
3.2.1 Move types

Upton (2002) conducted a study using the ICIC-FC with the goal of providing a
better, and more definitive, understanding of the discourse structure that under-
lies the persuasive aspect of direct mail letters. This study drew on the work done
by Bhatia (1998), who did a preliminary move analysis on a small set of direct mail
letters. Using a comprehensive, rigorous, and sustained analysis of data, a research
team at ICIC identified a seven-move structure.
Move Type 1, Get Attention: The communicative, functional purpose of this first
move type was to get and focus the readers attention at the start of the letter. This
move type could be realized through one of two steps. Step 1 is to start with a quo-
tation or story of some sort or a shocking or unexpected statement. Step 2 is to
start by offering some type of general pleasantries. Examples from letters in the
corpus of Move Type 1, as expressed through one or both of its two steps, are given
in Table 3.3.
Table 3.3 Examples of Move Type 1, Steps 1 & 2, from corpus
Move Type 1 Get Attention

Optional Steps:
Step 1 Pleasantries
1996 is off to a fast start!
What a Summer! And were just getting started!
Step 2 Quotation, story or shocking/unexpected statement
I learned about gardening when I was very young from
my parents. They always had a garden and now so do I.
The garden that I have now is very different from the
garden that my parents grew. Dad would start planting
about the fifteenth of April. He had two acres to plow so
he used a mule and a plow. My garden now is very dif-
ferent from my dads garden
Philanthropy is the rent we pay for the joy and privilege
we have for our space on this earth. Jerold Panas.
Cecilia desperately searched for medical care for her un-
born child. She would have a better chance of getting help
and delivering a healthy baby if she lived in Sweden. But
Cecilia lives in central Indiana. Cecilia might even be your
neighbor.
Move Type 2, Introduce the Cause and/or Establish Credentials: This move type
serves two general functions. It focuses on establishing the credentials of the or-
ganization by highlighting what the organization does and the contribution it can
make, and/or it serves to introduce the cause/need that the organization seeks to
address. For many non-profit organizations, their primary or even sole purpose is
to address a particular need; they talk about who they are and what they do in the
context of what the cause is. Consequently, these two functions are considered
part of one move type: introduce the cause and/or establish credentials of organiza-
tion. This move type could be expressed by any one or more of the following five
steps: 1) indicating a general problem or need, 2) highlighting a specific problem
or need, 3) highlighting the successes of past organization efforts, and 4) outlining
the mission of the organization. Examples from letters in the corpus of Move Type
2, as expressed through its four steps, are given in Table 3.4.
Table 3.4 Examples of Move Type 2, Steps 1 4, from corpus
Move Type 2 Introduce the cause and/or establish credentials of organization

Optional Steps:
Step 1 Indicate general problem/need
One of the biggest challenges you face may be to find
qualified, educated people to fill positions in your com-
panyIndy Reads is working to change that!
Step 2 Highlight specific problem/need
This summer, more than 300 children ages 4 through 14
will attend the YWCA of Indianapolis Everyone be-
longs Summer Day CampAs you can imagine, a sum-
mer like this is expensive to provide. And more than
30% of the kids we serve cannot afford the camp fee.
Step 3 Highlight the successes of past organization efforts
My name is Joe Cooper. Last year I was so proud to be
named student of the year that I thought my chest was
going to burst when I was on stage. I learned first hand
what GILL is all about, giving to others unselfishly.
Step 4 Outline the mission of the organization
Young women are growing up in an ever-changing soci-
ety. As a contributor to the Council in past appeals I
know that you are aware of our mission--to prepare girls
with ethical values, character, a desire to succeed and a
commitment to their community.
Move Type 3, Solicit Response: In the pilot study, it was observed that many letters
not only requested support but also sought some other type of response, such as
volunteering to help or contacting the organization for further information. Con-
sequently, this move type was labeled solicit response, which was realized by one
of two steps or both. Step 1, soliciting financial support has three options: Step
1A, state benefit of support to the need/problem; Step 1B, ask directly for pledge/
donation; and Step 1C, remind of past support to encourage future support. Step
2, soliciting other response, requests a response from the reader other than finan-
cial, such as volunteering to help. Examples from letters in the corpus of Move
Type 3, as expressed through its two steps, are given in Table 3.5.
Move Type 3 Solicit Response

Optional Steps:
Step 1 Solicit financial support
Step 1A: State benefit of support to the need/problem
You can help more than 200,000 people with just one
giftYour one gift to United Way of Central Indiana
supports 82 human service agencies... Only if you con-
tribute this year can these agencies continue to provide
programs and services that: Strengthen Families; In-
vest In Our Children; Serve The Elderly And Disa-
bled; Help People Become Self-Sufficient; Promote
Health And Well-Being
And thats why Im writing to you today. I urge you to con-
tinue to make a difference in the lives of individuals like
Cecilia and her son. You can literally help save a life.
Step 1B: Ask directly for pledge/donation
Please send your gift today.
Please send the largest contribution you can comfortably
make.
Step 1C: Remind of past support to encourage future support
Last year your memorial gift of $5 for hospice care in March
gave VNSF, Inc. the ability to address the needs of patients I
described above. I am asking that you consider supporting
our efforts once again this year with a similar gift.
You have helped make Goodwills work possible with
your previous support.
Step 2 Solicit other response
Every year we seek companies, organizations and indi-
viduals to sponsor one or more of our families If you
are interested and would like more information, please
contact We would like to have families matched with
sponsors non later than
Id be glad to respond to any questions you might have
about our work. You may call me at...
Move Type 4, Offer Incentives: In Move Type 4, the writer offers an incentive, or
indicates some other benefit of giving. In our analysis, we found that this move type
could be realized in one of two ways, either by Step 1, which is the offer of a tangi-
ble (e.g., a mug, a matching donation) incentive, or by Step 2, the noting of an
intangible (e.g., a good feeling) incentive for giving. Examples from letters in the
corpus of Move Type 4, as expressed through its two steps, are given in Table 3.6.

Move Type 4 Offer Incentives
Optional Steps:
Step 1 Offer of Tangible Incentive
Well send you our newsletters, invitations and member-
ship cards.
As an Indiana resident, your Federal tax-deductible con-
tribution also qualifies for a special Indiana State Income
Tax credit of 50%.
Your membership fee assures your receiving notices of ex-
hibition openings, lectures, discounts for Saturday School
and the Pre-College Workshop, and invitations to the Ja-
nus Ball, artists dinners and other Friends only events.
Step 2 Offer of Intangible Incentive
When your gift helps an outstanding student become an
outstanding teacher, you will know that you, too, have
touched the future.
I am sure you will feel good about giving.
If you enjoy reading the storiesthere is an excellent
chance that you will enjoy membership in the Indiana
Historical Society.
Move Type 5, Reference Insert: Move Type 5 is a simple, straightforward structure

that is used to draw attention to material beyond the letter itself that was included
in the mailing, such as a brochure, a pledge form, or a return envelope. Two exam-
ples of Move Type 5 from the corpus are:
(1) I have enclosed a return envelope for your convenience, as well as an overview
of the services we provide.
(2) I have enclosed a brochure which tells you more about the Chancellors Circle
and which includes a reply card. I have also enclosed a reply envelope for your
convenience.
When analyzing the direct mail letters in the corpus, it became clear that Move
Type 4 Offer Incentives and Move Type 5 Reference Insert were often embedded in
other move types. Take, for example, the following sentence: Please fill out the
enclosed card to send in your tax-deductible contribution to help support the
boys and girls at Camp X (emphasis added). The primary function of this sen-
tence is to solicit a financial response, Move Type 3, but there are two other func-
tions it seeks to accomplish: offering an incentive for contributing (tax-deducti-
ble), which is Move Type 4, and bringing attention to the enclosure (the enclosed
card), which is Move Type 5. It was decided to view this sentence and others like it
as containing three move types: the primary move of soliciting support and the
embedded moves of referencing insert and offering incentive. Consequently, the
two moves referencing insert and offering incentive were seen as being capable of
either standing alone or being embedded in other moves. A longer example of how
these two move types can be embedded in a longer move type, often Move Type 3,
is the following, with tags included to mark where move types start and stop:
(3) <begin Move Type 3> Let me assure you that we would appreciate receiving
one million dollars from you. But let me also assure you that we would ap-
preciate equally well any contribution you are able to make. Whatever you can
contribute, you will be helping to support a geology student at (university).
Your <begin Move Type 4> tax-deductible <end Move Type 4> contribution
may be sent <begin Move Type 5> in the enclosed postage-paid envelope with
the attached return card. <end Move Type 5> <begin Move Type 4> As an
Indiana resident, your gift qualifies for a special tax credit of 50% (up to a
maximum of $100 for an individual or $200 for a joint return). <end Move
Type 4> <begin Move Type 5> For your convenience, I am enclosing a copy of
Form CC-40, which should be filed with your Indiana State Income Tax. <end
Move Type 5> Please give today. <end Move Type 3>
Move Type 6, Express Gratitude: This move type, which is used to express thanks,
is realized by one or both of two steps. Step 1 offers thanks for past financial or
other support, and Step 2 offers thanks for current as well as future financial (or
other) support. Examples from letters in the corpus of Move Type 6, as expressed
through its two steps, are given in Table 3.7.

Move Type 6 Express Gratitude
Optional Steps:
Step 1 Thanks for Past Financial or Other Support
Thank you for your past gift to the Girl Scout Capital
Campaign.
I want to thank you for your past support of the Visiting
Nurse Service Foundation, Inc.
Step 2 Thanks for Current & Future Financial or Other Support
Your support is greatly needed and greatly appreciated.
Their appreciation and enthusiasm for what they are do-
ing will go a long way to thank you for your encourage-
ment and support.
Thank you again for sharing our hope for a future with-
out cancer.
Move Type 7, Conclude with Pleasantries: While not occurring as frequently as

the other move types, one final move type, conclude with pleasantries, comes at the
end of the letters and its communicative function is to bring the letter to a pleasant
close. Examples of Move Type 7 include the following:
(4) May you be blessed, today and always.
(5) I hope you have a nice day.
(6) Happy Holidays!
The complete move structure for direct mail letters is given in Table 3.8.
Table 3.8 Move structure of non-profit direct mail fundraising letters
Move Type 1: Get attention
Move Type 2: Introduce the cause and/or establish credentials of org.

Step 1 General problem/need indicated, and/or
Step 2 Specific problem/need highlighted, and/or
Step 3 Successes of past organization efforts highlighted, and/or
Step 4 Goals of future organization efforts outlined
Move Type 3: Solicit response

Step 1 Solicit financial support
Step 1A State benefit of support to the need/problem, and/or
Step 1B Ask directly for pledge/donation, and/or
Step 1C Remind of past support to encourage future support, and/or
Step 2 Solicit other response
Move Type 4: Offer incentives

Step 1 Offer of Tangible Incentive, and/or
Step 2 Offer of Intangible Incentive
Move Type 5: Reference insert
Move Type 6: Express gratitude

Step 1 Thanks for Past Financial or Other Support, and/or
Step 2 Thanks for Current & Future Financial or Other Support
Move Type 7: Conclude with pleasantries
3.2.2 Structural elements

All of the letters in the direct mail corpus include text that strikes the reader as
somehow different than the text in the body of the letter. Things like the date, ad-
dress information, and even the signature and the signature footer have a very
different function in the direct mail letter than the communicative functions
served by the move types described above. Their functions, while important and
in many respects required, are more structural in nature than communicative.
These features of the direct mail letters are called structural elements. According to
Crossley (2007), discussing the related genre of cover letters, It appears that while
structural elements are important to the framing of a cover letter, their individual
meaning is not so dependent upon the writers intention as much as upon their
inclusion by the writer. Structural elements are for the most part standardized pat-
terns that rarely differ from one writer to another (p. 7). In many respects, move
types are to structural elements as lexical words are to function words. Describing
the latter relationship, Biber et al (1999) see lexical words as the main building
blocks of texts, while function words are the mortar which binds the text to-
gether (p. 55); on a larger, genre level, move types can be seen as the main build-
ing blocks of the direct-mail letter while the structural elements provide the (boil-
erplate) scaffolding around which the letter is built.
The structural elements that are frequently found in direct mail letters were
examined to see what role they might play in the persuasive appeal of these letters.
Table 3.9 below describes the seven basic structural elements that can appear in
direct mail letters. As noted above, these elements are clearly something different
than the seven move types outlined in Table 3.8. They do not have clear or major
communicative functions, and they are for the most part very constrained (e.g.,
the date or writers name) or highly formulaic in nature (e.g., the salutation, the
complementary close).
While the study of these elements are tangential to the goals of discourse anal-
ysis, many instructional materials designed to train writers specifically address
and stress the importance of using these various elements to make direct mail let-
ters more persuasive (e.g., Cone, 1987; Lewis, 1997). Consequently, as practition-
ers view these structural elements as an important part of the direct mail letter,
and they are intended to have an impact on the reader, they seemed worth examin-
ing; structural elements are included here as they are represented in virtually all
direct mail letters and in fact can be viewed as markers that are used to help iden-
tify this text type
Table 3.9 Direct mail Structural Elements
Element A: Date line

The date when the letter was written/sent is given.
January 10, 1998
Element B: Address information

The address of the addressee is given. This provides a level of formality to the letter.
Joy Us Donor
123 Boulevard Road
Here, There 45678
Element C: Salutation
This is the opening greeting of the letter and is followed either by no punctuation, a comma, or
a colon.
Dear Joy Donor,
Element D: Complimentary Close

This is the word or phrase that draws the letter to a close and is followed by either a comma or
no punctuation.
Sincerely yours,
On behalf of our clients,
Element E: Signature
This is the authors penned signature.
Element F: Signature footer

This provides the printed name of the letter signer and/or the title of the signer.
Nahn Prophet
President
Element G: Footnote information

This is information located after everything else in the letter and indicates that there is other
information the reader should be aware of.
enclosure
cc
3.3 Analysis
Using the rubrics given in Table 3.8 outlining the rhetorical moves of the direct
mail letters and in Table 3.9 outlining the structural elements, two raters hand-
coded the rhetorical moves and structural elements in all 242 letters in the corpus.
As noted in Section 3.1 of Chapter 2, individual moves often reappeared through-
out a letter, and each appearance was counted as a distinct occurrence; as a result
a single move type could occur multiple times. Inter-rater reliability was calculated
at 84%, with all discrepancies reconciled through discussion. The vast majority of
discrepancies that occurred between the two raters resulted from initial disagree-
ment as to where one move ended and the next started, not as to the presence of a
particular move. This inter-rater reliability is quite good, since, as Bhatia notes,
there are sometimes cases which will pose problems and escape identification or
clear discrimination, however fine a net one may use. After all, we are dealing with
the rationale underlying linguistic behavior rather than its surface form (Bhatia,
1993a, p. 93). Once all of the moves were agreed upon and marked, each letter was
then tagged to indicate the start and stop of each move in each text.
The sequence of each move type and structural element for each text was also
noted. This allowed for the tracking of the total frequency of each move type in the
corpus, their relative locations in each letter (e.g., first, second, third), what other
move types a move most commonly occurred with, how frequently a move was
embedded in another move, and how frequently a move type occurred in the body
of the text as opposed to in a P.S.
3.4 Results
Move Type Frequencies and Lengths: Table 3.10 provides summary information
about the moves in this corpus of 242 direct mail letters, including the frequency
of each move type, the number of letters that contained each move type, and the
average number of words per move type. Not surprisingly, the most common
move type in all of these letters was Move Type 3 Solicit Response, which occurs
546 times. This represents 39% of all the moves occurring in this corpus, showing
up at the average rate of 2.3 times per letter.
Table 3.10 Move totals, percentages and rates of occurrence
Move Move Move Move Move Move Move

1 2 3 4 5 6 7
Moves
Total Number 35 362 546 113 153 148 33
% of total moves 2.5% 26.0% 39.3% 8.1% 11.0% 10.7% 2.4%
Letters
w/ 1 occurrence 35 226 236 85 127 124 31
% of total letters 15% 93% 97% 35% 52% 51% 13%
Words/move
Avg. 39 150 48 29 9 10 10
In fact, of the 242 letters, only six letters did not have at least one Move Type 3 oc-
curring at some point in the letter, with Move Type 3 represented in 97% (236/242)
of the letters. The second most common move was Move Type 2 Introduce the cause
and/or establish credentials of the organization, which occurred 362 times. At the
rate of 1.5 times per letter, this move represents 26% of all the moves in this corpus.
Move Type 2, like Move Type 3, also clearly seems to be a required move (that is,
one that almost every letter uses) in this genre as it occurs in 93% of the letters.
Move Type 4 (Offer Incentive) at 8.1% of the total moves, Move Type 5 (Refer-
ence Insert) at 11.0%, and Move Type 6 (Express Gratitude) at 10.7% occurred at
relatively similar rates of frequency across the 242 letters. While apparently op-
tional move types within this genre, each occurred fairly regularly in these letters:
Move Type 4 was represented at least once in 35% of the letters, Move Type 5 oc-
curred in 52% of the letters, and Move Type 6 occurred in 51% of the 242 letters.
Move Type 1 (Get attention) and Move Type 7 (Conclude with pleasantries)
were clearly icing-on-the-cake moves that writers of this genre could draw upon
when desired but did not do so very frequently. Move Type 1 represented only
2.5% of the moves in this corpus and occurred in only 15% of the letters. Similarly,
Move Type 7 represented 2.4% of the moves in this corpus and occurred in only
13% of the letters.
It is further possible to compare the lengths of each of these move types. Move
Type 2 is by far the longest move in this genre, averaging 150 words per occur-
rence. Move Type 3, the second longest move, is only one-third the length, at 48
words per occurrence. Move Types 5, 6 and 7 are the shortest, with Move Type 5
averaging 9 words per occurrence, and Move Types 6 and 7 averaging 10 words per
occurrence.
Structural Elements: Table 3.11 shows the relative frequency of each of the
structural elements of the direct mail letters in this corpus.
Table 3.11 Percentage of letters with each structural element
Structural Elements Percent of Letters
Element A: Date Line 77%

Element B: Address Information 51%
Element C: Salutation 88%
Element D: Complimentary Close 90%
Element E: Signature 89%
Element F: Signature Footer 87%
Element G: Footnote Information 7%
The vast majority of the letters in this corpus contained four structural elements, an
opening salutation (88%), a complimentary close (90%), a signature (89%), and a
typed signature footer (87%). The date line (77%) and address information (51%) were
more optional, while footnote information is included relatively infrequently (7%).
3.5 Discussion
Based on the results of the genre analysis of the 242 direct mail letters in this cor-
pus, a couple of observations can be made about how moves are used within the
genre. First of all, some of these moves are nearly obligatory in the genre, while oth-
ers seem to be merely optional.Secondly, it seems clear that the juxtaposition of the
moves relative to each other shows meaningful patterns. Move Type 2 (Introduce
the cause and/or establish credentials of organization) and Move Type 3 (Solicit re-
sponse) are the most important moves in this genre. The preeminence of these two
moves can be seen by the fact that not only do they occur in nearly every direct mail
letter in the corpus, but they generally occur more than once, they usually occur as
the first and second moves in the letter, they are by far the longest of the moves, and
they almost always occur in juxtaposition to each other.
That Move Types 2 and 3 are the most prominent in frequency, size, and
position in the letter is not surprising. At its most basic level, the purpose of the
direct mail letter is to tell the readers what the organization is and/or what the
need is, and to request funds to help the cause. These functions are accomplished
in these two moves. In contrast, the other five moves serve as optional tools that
individual writers in this genre can incorporate in various ways to tailor the effect
of the letter on the reader.
For example, Move Types 4 (Offers Incentive) and 5 (Reference Insert) clearly
play a secondary role in the direct mail letter as they tend to be quite short in length
and often embedded in another move, usually Move Type 3 (Solicit Response). Nev-
ertheless, their role appears to be an important one in that they are included in a
sizeable percentage of the letters (Move Type 4 in 35%; Move Type 5 in 53%). Es-
sentially, it seems their function is to serve as a reminder: In the case of Move Type
4, the readers most often are reminded either that contributions to non-profit or-
ganizations are tax-deductible, or that they will feel good about the contribution
that they make. With Move Type 5, the function of this move is simply to remind
the readers to look at other material that has been included with the letter.
Move Type 6 (Express Gratitude), occurring in 51% of the letters, also plays an
important role of informing the readers how much the organization appreciates
their support. Nevertheless, this role is noticeably a secondary one when the fre-
quency, number of occurrences and length of this move are considered in relation
to Move Types 2 (Introduce the cause and/or establish credentials of organization)
and 3 (Solicit response). Move Types 1 (Get attention) and 7 (Conclude with pleas-
antries) are clearly optional moves, with both of them occurring in fewer than 15%
of the letters.
Similar observations can be made about the structural elements that are in-
cluded; clearly there are some that are considered obligatory, such as the salutation
(Element C) and complementary close (Element D), and others that are more op-
tional, such as address information (Element B). The facts that most of these struc-
tural elements occur in most direct mail letters, and that practitioners themselves
view these as essential components of the direct mail letter (e.g., Cone 1987) sug-
gest that more careful analysis of these may be warranted in future studies. Indeed,
it could be argued that at least some of these elements should be viewed as moves
in themselves, as they are functional units of text serving a specific purpose that
adds to the persuasive nature of the letters. Textual choices within these structural
elements, for example how to phrase the salutation, are actually quite significant
and can be viewed as something beyond a standardized template.
3.6 Letter prototypes
One strength of this type of corpus analysis is that it allows us to develop proto-
types of the genre. Three such prototypes suggest themselves from these data. The
first prototype might be one that represents the most basic form of the direct mail
letter, using the moves and structural elements which occur in at least 85% of the
letters in the corpus. These include Move Types 2 (introduce the cause and/or es-
tablish credentials of organization) and 3 (solicit response), and Structural Elements
C (salutation), D (complimentary close), E (signature), and F (signature footer). An
example of such a letter is provided in Figure 3.8.
A second prototype might include all the moves and the structural elements
that occurred in over 50% of the letters in this corpus. These include Move Types
2, 3, 5 (reference insert) and 6 (express gratitude) as well as Structural Elements A
(date line), B (address information), C, D, E and F. An example of such a letter is
provided in Table 3.13.
Table 3.12 Prototype direct mail letter representing move types and structural elements
which occurred in 85% of the corpus.
Structural Mr./Mrs. Smith

Element C
Move Now more than ever, inner city girls need your support to help their dreams
Type 2 become a reality.
Each generation of girls faces new challenges: new technology, new moral is-
sues, new opportunities. Inner City Girls experience a wide range of real life
skills first aid, resume writing, and managing money. They also reap benefits
that are difficult to measure, including enhanced self-esteem, greater confi-
dence in their abilities, and the strength and conviction to take the lead and
excel in their endeavors.
We start early. As a preventative, informal education program, Inner City
Girls helps girls relate to others, develop values, contribute to their society,
and develop their own potential. This results in reduced risk of teen preg-
nancy, suicide, truancy, substance abuse and so many other crises.
Move Your gift to the 1997 Inner City Girls Annual Campaign helps to ensure that
Type 3 girls will continue to receive the benefits that Inner City Girls offers. Todays
girls will be tomorrows leaders and they are counting on you.
Structural Sincerely,
Element D
Structural (Signature)
Element E
Structural Sally Mentor
Element F President
1997 Inner City Girls Annual Campaign
Table 3.13 Prototype direct mail letter representing move types and structural elements
which occurred in 50% of the corpus
Structural October 26, 2000

Element A
Structural Sam Q. Doe
Element B 123 Street Dr.
Somewhere, IN 46202
Structural Dear Sam,
Element C
Move For many of the children and seniors that Help Your Neighbor cares for, the
Type 2 Holiday season can be a troubling time. Nearly every day HYN receives a call
about a patient or family in need of home care who has limited financial re-
sources. Calls for help from families that need the crisis services HYN pro-
vides for their children ring throughout the season. This is not the ringing that
you and I traditionally picture during the holiday season.
Move But there is something that you can do to help. With your gift of sharing, you
Type 3 are:
*providing needed home care services to the most needy
*giving emergency respite to families of children at risk for neglect or
abuse
*helping establish a Golden Touch program to provide companionship
and homemaker services to homebound seniors.
Move Help Your Neighbor has been a part of this community for over 85 years.
Type 2 Serving the needy has been an important part of our mission. Over the last ten
years, HYN has delivered over $1 million worth of free services to the citizens
of Somewhere.
Move But we cannot do it alone. We need your help. A gift of sharing can bring
Type 3 comfort and hope to those most in need during this holiday season. Please use
Move the enclosed envelope to make a contribution to help us ease the suffering and
Type 5 indeed ring in a most joyous holiday season.
Move I thank you for your generous support.
Type 6
Element D
Element E
Structural Bob L. Brown
Element F President & CEO
A third prototype might simply show what a direct mail letter would look like if it
used each of the possible move types and structural elements that define this gen-
re; Table 3.14 provides an example of such a letter. It should be pointed out, how-
ever, that most real-world direct-mail letters do not use all seven possible rhetori-
cal move types and, in fact, only one letter in this corpus did.
Table 3.14 Prototype direct mail letter representing all possible move types and struc-
tural elements
Structural October 26, 2000

Element A
Move Do all the good you can, by all the means you can, in all the ways you can, in
Type 1 all the places you can, at all the times you can, to all the people you can, as long
as ever you can. John Wesley
Structural Sam Q. Doe
Element B 123 Street Dr.
Somewhere, IN 46202
Structural Dear Sam,
Element C
Move Ebenhazer cares for at-risk children and families. We do this through a wide
Type 2 range of programs including community-based, therapeutic foster care, group
homes and our treatment center. Many of the children are victims of abuse or
live in unstable homes.
Move This Christmas season we are asking you to take a few minutes to consider
Type 3 making a contribution to Ebenhazer to help the 1,500 children and families
that we care for.
Move Many of the children have no homes; no memories of joy from past holidays.
Type 2 Others are from families that are struggling to provide a healthy, happy envi-
ronment but dont have the resources to make it possible.
Move Your contribution will make a difference in a childs life. It may help a family
Type 3 stay together. It can certainly make happy holiday memories.
A gift to Ebenhazer means the children in our care will have presents to open.
A gift means a family will have a holiday meal, cooking utensils to prepare the
Move meal and dishes to serve it on. Your gift will go beyond the holiday season. It
Type 5 can help purchase clothing, school supplies, books and educational tools
Move throughout the year.
Type 4 Please use the enclosed donation card and return envelope and mail your tax-
deductable donation to Ebenhazer today.
Move Thank you in advance for your gift.
Type 6
Move We wish you and your family a new year full of joy and love.
Type 7
Element D
Element E
Structural Mary Smith
Element F Director
Move P.S. Let our families and children know you want them to have the same kind
Type 3PS of memories of the holidays you will have. Please give generously.
Move Thank you for thinking of Ebenhazer this Christmas season.
Type 6PS
Structural Enclosures
Element G
4 Linguistic analysis of moves: Tracking the use of stance structures
As introduced in Chapter 1, the goal of this book is to move beyond simply seg-
menting texts into well-defined discourse units (in this case, moves); the goal is
also to analyze the linguistic characteristics of each individual discourse unit and
each discourse unit type (i.e., the move types), to determine the typical linguistic
characteristics of the units. Although they are defined in functional terms, moves
are constructed from linguistic devices, including word choice, phrase types, and
grammatical features (e.g., tense, aspect, voice). Many of these linguistic devices
are used to express stance: personal feelings, attitudes, value judgments, or assess-
ments (Biber et al., 1999, p. 966). Linguistic features used for these functions are
especially important in direct-mail letters.
There have been numerous studies of the linguistic mechanisms used by
speakers and writers to convey their personal feelings and assessments, carried out
under several different labels, including evaluation (Hunston, 1994; Hunston &
Thompson, 2000), intensity (Labov, 1984), affect (Ochs, 1989), evidentiality
(Chafe, 1986; Chafe & Nichols, 1986), hedging (Holmes, 1988; Hyland, 1996a,
1996b), persuasion (Hyland, 2004a), and stance (Barton, 1993; Beach & Anson,
1992; Biber, 2004, 2006a, 2006b; Biber & Finegan, 1988, 1989; Biber et al., 1999,
Chapter 12; Conrad & Biber, 2000; Hyland, 1999b; Precht, 2000). In the present
case, we adopt the framework of stance devices developed in Biber et al.(1999)
and Biber (2006a,b) to analyze the ways in which move types in direct-mail letters
differ linguistically.
Because non-profit direct mail letters are overtly persuasive in nature, there is
little question that stance plays an important role in this genre. We are interested
in looking at how the use of stance structures (as opposed to other expressions of
stance, like word choice) varies from move to move. We believe that identifying
stance structures could be important in untangling the language structures used in
direct-mail letters and provide a better describing the function of the different
moves in the genre.
4.1 Identifying grammatical stance devices
According to Biber et al.(1999), the five most common grammatical devices used
to express stance are: 1) stance adverbials, 2) stance complement clauses (specifi-
cally that and to clauses), 3) modals, 4) premodifying stance adverbs (e.g., Im
so happy for you.), and 5) stance nouns followed by prepositional phrases. While
Biber (2006a; 2006b) has previously analyzed the use of grammatical stance de-
vices in specific registers (comparing spoken and written academic registers), this
study seeks to compare and contrast the use of these stance devices across the
move types within a single genre.
Each move was automatically tagged using a grammatical tagger. While the
tagging program, developed by Biber, identifies a wide variety of linguistic features
(see Appendix Two), we focused here only on those grammatical devices that ex-
press stance. These features are given in Table 3A at the end of the chapter. The rate
of occurrence for each stance feature within each move type was calculated. In the
following discussion, we focus on only the stance features that occurred at least 3
times per 1,000 words.
4.2 Interpreting the use of grammatical stance devices used in moves
As expected, since each of the moves has very different rhetorical functions within this
persuasion-motivated genre, the seven different move types all use different combina-
tions of grammatical stance devices. Table 3.15 provides a breakdown of the results by
move, showing those stance devices that occurred at a rate of 3 per 1,000 words.
Table 3.15 Common grammatical stance devices by move type
Move Type Stance Structure Occurring Rate/1000 words

3 times/1000 words
Move 1: Stance Adverbials of Certainty 7.9

Get attention Modals of possibility/permission/ability 7.0
Modals of prediction/volition 13.1
Move 2: NA
Introduce cause/
establish credentials
Move 3: Modals of possibility/permission/ability 12.1

Solicit response Modals of prediction/volition 14.4
To-complement clauses controlled by 7.8
(all) stance verbs
Move 4: Modals of possibility/permission/ability 7.2

Offer incentives Modals of prediction/volition 19.7
Move 5: Modals of necessity/obligation 3.4

Reference insert
Move 6: Modals of prediction/volition 11.2

Express gratitude To-complement clauses controlled by 4.4
desire/intention/decision stance verbs
To-complement clauses controlled by 5.6
all stance verbs
Pre-modifying stance adverbs 3.0
Move 7: Stance Adverbials of Certainty 5.6

Conclude w/ To-complement clauses controlled by 8.5
pleasantries desire/intention/decision stance verbs
Pre-modifying stance adverbs 7.2
Table 3.15 provides the basis for interpreting how the different moves in this genre
tend to use the different grammatical structures of stance in order to accomplish
their rhetorical purpose.
The purpose of Move Type 1 (Get Attention) is to engage the reader and get
him/her interested in the cause/need being promoted. The move typically contains
a quotation, story, or strong general pleasantries. The fairly strong reliance on mo-
dals of possibility/ability and modals of prediction have the purpose of empower-
ing the reader and trying to show that the reader can make a difference. This can
be seen in the following examples.
Modals of possibility/ability (italics added to show usage):
(7) You might hear some ugly talk this summer.
(8) YOU can be the one to open the door.
Modals of prediction (italics added to show usage):

(9) The urgency you feel to make changes is just the extent that change will be
made.
(10) Until he extends the circle of his compassion to all living things, man will not
himself find peace.
Stance adverbials of certainty, the other stance structure frequently used in Move
Type 1, contribute to getting the readers attention by underscoring the need.
Stance adverbials of certainty (italics added to show usage):
(11) Please send a million dollars so we can really support geological activities
here at IUPUI in perpetuity.
(12) (quoting Margaret Mead) Never doubt that a small group of thoughtful, com-
mitted citizens can change the world, indeed its the only thing that ever has.
Move Type 2, Introducing the cause and establishing credentials, did not include any
especially frequent use of specific stance structures. Looking at the letters in the
corpus more carefully, it appears that this move is written in a more matter of
fact manner. Unlike the other moves, the emphasis in this move is on content and
facts what the organizations do and what the needs are rather than emphasiz-
ing personal feelings, attitudes, value judgments, or assessments. For example:
(13) The number of companies reporting a shortage of skilled workers almost
doubled from 1995 to 1998; from 27percent to more than 47percent. Did you
know that about 20 percent of Americas workers have low basic skills and
75percent of unemployed adults have reading or writing difficulties?
Indy Reads is working to change that!
(14) In 1985 a group of courageous pioneering women established the YWCA of

Indianapolis to meet the needs of women, and in 1998 the tradition contin-
ues. The WYCA of Indianapolis still focuses on, supports, and gives empow-
erment to women and their families. Empowerment refers to meeting the
needs of girls and women so that they can freely exercise the power to deter-
mine and direct their lives.
Move Type 3, Solicit response, can incorporate one of two steps, either soliciting fi-
nancial support or soliciting other response (a non-financial contribution from
readers, such as volunteering to help). The stance structures most commonly used
in Move Type 3 are modals of possibility and ability, modals of prediction and
volition, and to-complement clauses controlled by stance verbs. Looking again
more closely at the letters themselves, Move Type 3 frequently uses modals of pos-
sibility and ability in order to state the benefit of support for the reader. The modal
can, indicating ability, is by far the most common (occurring 188 times); the mo-
dal may is the next most common (occurring 45 times), indicating possibility:
(15) You can help people reach their dreams of reading and learning by making a
contribution to Indy Reads.
(16) It may help a family stay together.
Modals of prediction and volition, on the other hand, were typically used to ask
directly for a pledge or donation:
(17) Will you help them change?
(18) We hope you will become a partner of Indy Reads
To-complement clauses controlled by stance verbs most frequently appear at the

end of the move and play a role in making clear what it is the organization wants
the reader to do in response to the letter.
(19) We are hopeful that you will agree to help.
(20) If you have any questions or concerns at any time, please do not hesitate to
call me.
(21) When you are contacted by your Campus Campaign volunteer, we hope
youll choose to become one of the many partners in the community of IU-
PUI.
Move Type 4, Offer incentives, makes frequent use of modals of possibility, permis-
sion, and ability, but modals of prediction and volition are used at an extremely
high rate of nearly 20 times per 1,000 words. These structures parallel those used
in Move Types 1 and 3, but support very different rhetorical purposes. Modals of
possibility/permission/incentive typically are tied to offers of tangible incentives,

as illustrated by examples (22) and (23).
(22) I hope we can include your name among the list of inaugural members of the
1994 Black Cane Society.
(23) Based on each individual tax situation, your gift may be tax deductible
Modals of prediction and volition are also used to support reciprocal offers, includ-
ing offers of tangible incentives (24, 25) and offers of intangible incentives (26).
(24) Corporate contributors will be acknowledged in our newsletter, annual re-
port and on the Indy Reads webpage.
(25) However, these tax credits are only available for a limited time, so we ask that
you act soon if you would like to use them.
(26) I am sure you will feel good about giving.
Move Type 5, Reference insert, uses only one grammatical stance structure consist-
ently, but this structure, modals of necessity and obligation, is not used regularly
by any other move. Looking at this structure in context, it is clearly used to direct
readers attention to materials included with the letter.
(27) For your convenience, I am enclosing a copy of Form CC-40, which should
be filed with your Indiana State Income Tax.
What is most interesting about the regular use of this particular stance structure in
this move is that it is very directive, explicitly telling the reader what s/he must,
should, or ought to do. This is a rather surprising structure to see in a letter such as
this whose whole purpose is to persuade a reader to make a financial (usually)
contribution; telling someone they have to do something (when they really dont)
is usually not a successful persuasion tactic. Nevertheless, within Move 5, this
stance structure does not come across as inappropriate, primarily because it is not
part of the solicitation itself but points the reader to steps that will benefit him/
herself, rather than the agency.
Move Type 6, Express gratitude, commonly uses modals of prediction and vo-
lition (28 and 29), as well as to-complement clauses controlled by stance verbs (30
and 31) to thank readers in advance for potential donations.
(28) Your check will be greatly appreciated.
(29) I would like to thank you for your commitment to dental hygiene educa-
tion.
(30) I want to thank you for your help.
(31) I want to express my gratitude to those of you who have already pledged or
contributed in 1991.
The level of appreciation for the gift is frequently signaled in this move through the
use of pre-modifying stance adverbs, as illustrated by the following two examples.
(32) Thank you so much for your help.
(33) I can only hope that you know how appreciative we at the Indianapolis Zoo
are of your philanthropy.
Move Type 7, Conclude with pleasantries, is the only move other than Move Type
1 to use stance adverbials of certainty. In this move, this structure always occurs in
rather formulaic expressions, as shown by the following examples.
(34) May you be blessed, today and always, as you so generously share your blessing.
(35) I am always happy to hear from you about your accomplishments.
Move Type 7 also has the highest rate of to-complement clauses controlled by de-
sire/intention/decision stance verbs, but like the adverbials of certainty, these all,
without exception, occur in short, formulaic structures that tend to end the letter.
(36) I hope to see you there.
(37) I hope to hear from you soon.
Lastly, just as Move Type 6 uses pre-modifying stance adverbials for emphasis,
Move Type 7 uses this structure in the same way. In fact, the only pre-modifying
adverb that is commonly used in this move is the adverb so before an adjective: so
great, so closely, so generously.
In sum, all the move types in the fund raising letters, with the exception of
Move Type 2, frequently use one or more grammatical stance devices, and the
combination of grammatical structures used are distinctive, with no two moves
using the same set of structures. Our results suggest that different moves use some-
what different stance structures, which supports the need to teach different strate-
gies for different moves. Overall, however, the results were unexpected in showing
a rather limited use of stance structures. Modals of possibility/ability and predic-
tion were used along with to complement clauses. Missing, however, were many
stance features that are typically considered part of persuasive discourse, e.g., mo-
dals of obligation, stance adverbials and premodifying stance adverbs. The lack of
variety in the stance use suggests a discourse that treads carefully, does not take
strong positions, and does not put strong demands on the reader.
A previous study using Bibers multi-dimensional features (Connor & Upton,
2003) had suggested that fundraising letters as a genre are similar to academic
prose, a finding, which was unexpected. The current study further supports this
general finding: both genres use a limited range of grammatical stance features,
restricted primarily to modal verbs and to complement clauses (compare the find-
ings here to those reported in Biber 2006a,b for academic prose). Thus, despite
their apparent differences in communicative purpose, we see here that these two
genres are surprisingly similar in the kinds of stance expressed and the particular
linguistic devices used for these functions.
5 Final thoughts
One goal of this chapter was to outline a general approach that can be used to iden-
tify and analyze the moves of a genre in a corpus of texts, and to provide a specific
and detailed example of how this type of analysis can be done. As noted in Chapter
2, a move analysis seeks to identify the components (moves) of a genre by the com-
municative purposes they serve. These communicative purposes must be identified
within the context of the genre as well as the social context in which the genre resides
(e.g., fundraising direct mail relationships). The question that we are seeking to an-
swer with this type of analysis is, What are the rhetorical structures that address the
specific purposes of the genre, and if these vary, how so? Because we are seeking to
understand why a genre is structured the way it is, and because it is important to ac-
count for the socio-cultural, institutional, and organizational influences on a specific
genre, there is naturally a subjective element to this sort of analysis.
However, despite its subjective nature, certain guidelines can be followed that
enables an empirical analysis of moves in a corpus of texts (see also Chapter 2).
First, extreme care must be taken to collect good data. In the present case, the
corpus of fundraising discourse was well planned involving the input of both
fundraising practitioners and linguists and carefully documented, and was large
enough to provide reliable results. Then, a series of pilot studies were run with a
research team, first to develop a working set of genre-specific move types with
distinct definitions, and then to confirm the inter-rater reliability of using these
moves to analyze the individual texts in the corpus.
Once the move types were clearly defined and all of the documents in the data
set coded (and checked by multiple raters), the next step was to look for patterns
in how and when the different moves were used in order to help explain the spe-
cific role of each move more broadly within the genre. The goal was to have a full
understanding of the communicative purposes and functions that different parts
of the genre have and how they work together to accomplish the overall commu-
nicative aim of the genre.
Nevertheless, although a move analysis uses communicative function as the
starting point for understanding the rhetorical purposes of a genre, the expecta-
tion is that these distinct functions are realized through the use of distinct and
consistent linguistic features. Consequently, it should be possible to see variation
in linguistic patterns from one move to the next.
The second and more important goal of this chapter for the purposes of this
book was to show the contribution that a corpus-based approach could make in
the analysis of discourse structure (e.g., move structure). By analyzing generic
moves in a fairly large specialized corpus of direct mail letters collected from mul-
tiple non-profit organizations of various types (e.g., environmental, education), we
are able to generalize the findings and develop representative prototypes that can
be used for exemplification and training. It then becomes possible to authorita-
tively compare and contrast the discourse structures of different types of texts in
order to gain a clearer understanding of how each uniquely accomplishes its com-
municative purposes.
In addition, using a corpus of texts to analyze discourse structures makes it
much easier to identify alternate ways (steps) for accomplishing common func-
tions (moves); such variations can be easily missed or misinterpreted when look-
ing at individual texts. In the same vein, a corpus-based analysis makes it easy to
identify which moves (and steps) are more common, even required, and which are
optional or idiosyncratic and can be used at the discretion of the writer without
the reader feeling the text is non-standard or inappropriate. In a more detailed
analysis, a corpus-based approach will even permit generalizations about where
different discourse structures occur within a typical text, and where they occur
relative to other structures (i.e., before, after, or within).
A corpus-based analysis also allows for the detailed analysis of the linguistic
characteristics of the discourse units that have been identified. In the discourse anal-
ysis done in this chapter, the corpus-based approach allowed us to make detailed
observations about the specific grammatical stance devices that each of the different
move types used to accomplish their unique functions in the genre. Further analysis
would likely reveal other linguistic differences among these move types.
Table 3A Grammatical devices used to express stance
1. Stance adverb(ial)s (See Biber et al, 1999, pp. 557558, 853874)

Expressing Certainty:
actually, always, certainly, definitely, indeed, inevitably, in fact, never, of course, obviously, really,
undoubtedly, without doubt, no doubt
Expressing Likelihood:
apparently, evidently, kind of, most cases, most instances, perhaps, possibly, predictably, probably,
roughly, sort of, maybe
Expressing Attitude:
amazingly, astonishingly, conveniently, curiously, disturbingly, hopefully, even worse, fortunately, impor-
tantly, ironically, regrettably, rightly, sadly, sensibly, surprisingly, unbelievably, unfortunately, wisely
Expressing Style:
accordingly, according to, confidentially, figuratively, frankly, generally, honestly, mainly, strictly,
technically, truthfully, typically, reportedly, primarily, usually
2. Complement clauses controlled by stance verbs, adjectives, or nouns

2.1 Stance verb + that-clause. (See Biber et al, 1999, pp. 661670)
Verbs Expressing Certainty:
acknowledge, affirm, ascertain, calculate, certify, check, conclude, confirm, decide, deem, demon-
strate, determine, discover, find, know, learn, mean, meant, meaning, note, notice, observe, prove,
realize, recall, recognize, recollect, record, remember, see, show, signify, submit, testify, understand
Verbs Expressing Likelihood:
appear, assume, believe, bet, conceive, consider, deduce, detect, doubt, estimate, figure, gather,
guess, hypothesize, imagine, indicate, intend, perceive, postulate, predict, presuppose, presume,
reckon, seem, sense, speculate, suppose, suspect, think, wager
Verbs Expressing Attitude:
accept, admit, agree, anticipate, boast, complain, concede, cry, dream, ensure, expect, fancy, fear, feel,
forget, foresee, guarantee, hope, mind, prefer, pretend, reflect, require, resolve, trust, wish, worry
Verbs Expressing Speech Act (and other communication verbs):
add, announce, advise, answer, argue, allege, ask, assert, assure, charge, claim, confide, confess,
contend, convey, convince, declare, demand, deny, emphasize, explain, express, forewarn, grant,
hear, hint, hold, imply, inform, insist, maintain, mention, mutter, notify, order, persuade, petition,
phone, pray, proclaim, promise, propose, protest, reassure recommend, remark, reply, report, re-
spond, reveal, say, shout, state, stress, suggest, swear, sworn, teach, telephone, tell, urge, vow, warn,
whisper, wire, write
2.2 Stance verb + to-clause. (See Biber et al, 1999, pp. 693715)
Verbs Expressing Probability (likelihood):
appear, happen, seem, tend
Verbs Expressing Cognition/perception:
assume, believe, consider, estimate, expect, felt, find, forget, hear, imagine, judge, know, learn, pre-
sume, pretend, remember, see, suppose, take, trust, understand, watch
Verbs Expressing Desire/Intention/Decision:
aim, agree, bear, care, choose, consent, dare, decide, design, desire, dread, hate, hesitate, hope, in-
tend, like, look, love, long, mean, need, plan, prefer, prepare, refuse, regret, resolve, schedule, stand,
threaten, volunteer, wait, want, wish
Verbs Expressing Causation/Modality/Effort:
afford, allow, appoint, arrange, assist, attempt, authorize, bother, cause, counsel, compel, defy, de-
serve, drive, elect, enable, encourage, endeavor, entitle, fail, forbid, force, get, help, inspire, instruct,
lead, leave, manage, oblige, order, permit, persuade, prompt, require, raise, seek, strive, struggle,
summon, tempt, try, venture
Verbs Expressing Speech Act (and other communication verbs):

ask, advise, beg, beseech, call, claim, challenge, command, convince, decline, heard, invite, offer,
pray, promise, prove, remind, report, request, say, said, show, teach, tell, urge, warn
2.3 Stance adjective + that-clause. (See Biber et al, 1999, pp. 671674; many of these occur with
extraposed constructions)
Adjectives Expressing Certainty:
accepted, apparent, certain, clear, confident, convinced, correct, evident, false, impossible, inevita-
ble, obvious, positive, proved, plain, right, sure, true, well-known
Adjectives Expressing Likelihood:
doubtful, likely, possible, probable, unlikely
Adjectives Expressing Attitude/Emotion:
adamant, afraid, alarmed, amazed, amused, angry, annoyed, astonished, aware, careful, con-
cerned, curious, depressed, disappointed, dissatisfied, distressed, disturbed, encouraged, frightened,
glad, grateful, happy, hopeful, hurt, irritated, mad, pleased, reassured, relieved, sad, satisfied,
shocked, surprised, thankful, unaware, uncomfortable, unhappy, unlucky, upset, worried
Adjectives Expressing Evaluation:
acceptable, advisable, amazing, annoying, anomalous, appropriate, awful, conceivable, critical,
crucial, desirable, dreadful, embarrassing, essential, extraordinary, fitting, fortunate, funny, good,
great, horrible, imperative, incidental, inconceivable, incredible, indisputable, interesting, ironic,
lucky, natural, neat, necessary, nice, notable, noteworthy, noticeable, obligatory, odd, okay, para-
doxical, peculiar, preferable, ridiculous, sensible, shocking, silly, sorry, strange, stupid, sufficient,
surprising, tragic, typical, unacceptable, unaware, uncomfortable, understandable, unfair, unfortu-
nate, unthinkable, untypical, unusual, upsetting, vital, wonderful
2.4 Stance adjective + to-clause. (See Biber et al, 1999, pp. 716721; many of these occur with
extraposed constructions)
Adjectives Expressing Certainty/Likelihood):
apt, certain, due, guaranteed, liable, likely, prone, unlikely, sure
Adjectives Expressing Attitude/Emotion:
afraid, amazed, angry, annoyed, ashamed, astonished, concerned, content, curious, delighted, dis-
appointed, disgusted, embarrassed, free, furious, glad, grateful, happy, impatient, indignant, nerv-
ous, perturbed, pleased, proud, puzzled, relieved, sorry, surprised, worried
Adjectives Expressing Evaluation:
awkward, appropriate, bad, best, better, brave, careless, convenient, crazy, criminal, cumberome, de-
sirable, dreadful, essential, expensive, foolhardy, fruitless, good, important, improper, inappropriate,
interesting, logical, lucky, mad, necessary, nice, reasonable, right, safe, sick, silly, smart, stupid, sur-
prising, useful, useless, unreasonable, unseemly, unwise, vital, wise, wonderful, worse, wrong
Adjectives Expressing Ability/Willingness:
able, anxious, bound, careful, competent, determined, disposed, doomed, eager, eligible, fit, greedy,
hesitant, inclined, insufficient, keen, loath, obliged, prepared, quick, ready, reluctant, set, slow, suf-
ficient, unable, unwilling, welcome, willing
Adjectives Expressing Ease or Difficulty:

difficult, easier, easy, hard, impossible, pleasant, possible, tough, unpleasant
2.5 Stance noun + that-clause. (See Biber et al, 1999, pp. 648651)
Nouns Expressing Certainty:
assertion, conclusion, conviction, discover, doubt, fact, knowledge, observation, principle, realiza-
tion, result, statement
Nouns Expressing Likelihood:
assumption, belief, claim, contention, expectation, feeling, hypothesis, idea, implication, impres-
sion, indication, notion, opinion, possibility, presumption, probability, rumor, sign, suggestion, sus-
picion, thesis
Nouns Expressing Attitude/Perspective:
grounds, hope, reason, view, thought
Nouns Expressing Communication:
comment, news, proposal, proposition, remark, report, requirement
2.6 Stance noun + to-clause. (See Biber et al, 1999, pp. 652653)
agreement, authority, commitment, confidence, decision, desire, determination, duty, failure, incli-
nation, intention, obligation, opportunity, plan, potential, promise, proposal, readiness, reluctance,
responsibility, right, scheme, temptation, tendency, threat, wish, willingness
3. Modal and semi-modal verbs (See Biber et al, 1999, pp. 483ff.)
Modals Expressing Possibility/Permission/Ability:
can, could, may, might
Modals Expressing Necessity/Obligation:
must, should, (had) better, have to, got to, ought to
Modals Expressing Prediction/Volition:
will, would, shall, be going to
4. Premodifying stance adverb (stance adverb + adjective or noun phrase)

Most common premodifying adverbs (See Biber et al, 1999, pp. 544ff):
Adverbials + adjectives (It was perfectly quiet.)
awfully, completely, extremely, how, perfectly, quite, really, slightly, so, totally, very
Adverbials + nouns (It is almost time; It was quite a surprise.)
about, almost, completely, quite, really
5. Stance noun + prepositional phrase (of + NP or for + NP)

(See 2.5 and 2.6 for list of stance nouns used.)
chapter 4
Rhetorical moves in
biochemistry research articles
BY Budsaba Kanoksilapatham
The study described in this chapter provides another example of the powerful
descriptive nature of a corpus-based, top-down approach to discourse analysis1.
Unlike previous move-based studies of research articles, this is the first study
to undertake a comprehensive coding of all the moves in a fairly large corpus
that represents all four sections introduction, methods, results, discussion
followed by an analysis of the linguistic structures that make up those moves. In
keeping with the steps introduced in Table 1.1, the first steps in the study (after
the compilation of a representative corpus) were to identify the rhetorical move
types used in biochemistry research articles, segment the texts into moves, and
then identify the specific move type each represents. Then, following steps 45
described in Table 1.1, multidimensional analysis (see Appendix 1) was used to
identify the linguistic characteristics of each rhetorical move, and to analyze the
typical linguistic characteristics of each move type. Finally, the typical discourse
organization of research article sections is analyzed in terms of these move types.
The integration of move analysis and multidimensional analysis provides us
with a comprehensive communicative and linguistic description of the discourse
of biochemistry research articles, underscoring the value of a corpus-based
approach.
1 Background
Previous move-based studies of scientific research articles have provided valuable

insights regarding the rhetorical moves conventionally employed in each of the
four internal sections (introduction, methods, results, discussion; see Chapter 2).
Discipline-specific variations are also discernible (e.g., Anthony, 1999; Brett, 1994;
1. The material presented in this chapter is based upon dissertation research supported by the Natio-
nal Science Foundation, USA, under Grant No. 0213948 and a TOEFL Grant for Doctoral Research.
Part of this material has been previously published in English for Specific Purposes, 24, 3, 269292.
Chu, 1996; Dubois, 1997; Naczi, Reznicek, & Ford, 1998; Swales & Luebs, 2002; D.
Thompson, 1993) suggesting that the rhetorical organization of research articles is
constrained by conventions of the academic disciplines and by the expectations of
the relevant discourse communities.
However, the findings generated by these studies must be treated with caution.
First, many of these studies do not analyze a representative corpus of the discipline
they studied. Sampling problems include experts subjective recommendation of
the journals analyzed (e.g., Nwogu, 1997; Posteguillo, 1999), reflecting individual
preferences rather than the actual academic prestige of the journals. Other sam-
pling problems include lack of specification criteria for article selection (e.g.,
Swales & Najjar, 1987; Williams, 1999), mixture of different genres (such as clinical
reports and experimental articles) in the same corpus (e.g., Williams, 1999), and
non-compatibility of journals (e.g., specialized journals and interdisciplinary jour-
nals in the same corpus in Berkenkotter & Huckin, 1995). In addition, most previ-
ous studies have focused on a single section of articles, rather than the overall or-
ganization of research articles across all four sections (introduction, methods,
results, discussion). The unsystematic and subjective selection of research articles
investigated, and the mixed and unrepresentative nature of the corpus, preclude
valid generalizations about the rhetorical organization of the target genre.
Perhaps more importantly, previous research has been limited by its exclusive
focus on rhetorical moves, with little or no attention given to the lexico-grammatical
characteristics of moves. For this reason, we know little at present about the typical
linguistic characteristics of the different move types that comprise research articles.
The study presented in this chapter is unique in several ways. First, it analyzes
the discourse structure of all four sections (introduction, methods, results, discus-
sion) in scientific research articles. Prior to this study, Nwogus 1997 study was the
only one that described moves in all four sections of research articles, based on
analysis of 15 medical articles that had been recommended by medical practition-
ers. While a useful initial analysis, that study was still quite restricted in scope. In
the present study, based on Swales (1990; 2004) framework2, 60 biochemistry re-
search articles (which were systematically collected as part of a representative cor-
pus) were first analyzed for move structure. This qualitative approach was then
2. Swales original model in 1990 was revised in 2004 (230, 232) consisting of three moves.
Move 1: Establishing a topic is realized by topic generalizations of increasing specificity. Move 2:
Preparing for the present study (citations possible) is realized as Step 1A: Indicating a gap or Step
1B: Adding to what is known, and Step 2: Presenting positive justification. Move 3: Presenting the
present work is realized by up to seven steps--Step 1: Announcing present research descriptively
and/or purposively, Step 2: Presenting research questions or hypotheses, Step 3: Definitional cla-
rifications, Step 4: Summarizing methods, Step 5: Announcing principal outcomes, Step 6: Sta-
ting the value of the present research, Step 7: Outlining the structure of the paper.
Chapter 4. Rhetorical moves in biochemistry research articles
complemented by quantitative analysis of specific linguistic characteristics of each

move type. Some previous grammatical-rhetorical studies have described the
functions of individual linguistic features in research articles (e.g., Salager-Meyer,
1997, on hedging; D. Thompson & Ye, 1991, on reporting verbs). In general,
though, these studies do not document linguistic differences across research arti-
cle sections, and no study to date has attempted to describe systematic linguistic
differences among the move types within research article sections.
In contrast, the present study undertakes a detailed linguistic description of
each move type. This description incorporates analysis of 41 distinct linguistic fea-
tures, a large set of features made possible by corpus-based techniques (including
multidimensional analysis). Combining the strengths of both qualitative and quan-
titative corpus analysis tools, this study illustrates a novel and successful application
of multidimensional analysis for top-down discourse analysis: to systematically
identify the linguistic features associated with each move type (representing differ-
ent communicative purposes) and to provide a more comprehensive description of
rhetorical organization in research articles than has been previously feasible.
2 Description of the corpus
The term biochemistry was first introduced in 1903, but this field has mushroomed
so that it is now represented by 261 specialized journals published worldwide (Jour-
nal Citation Reports, 2004). As a result, it is no easy challenge to build a repre-
sentative corpus of research articles from this academic discipline, ensuring that the
articles contained in the corpus truly represent the range of research articles in bio-
chemistry. To control for possible differences among national varieties of English
and across time, only journals published in the United States in the year 2000 were
considered. The corpus was further restricted to research articles from the five most
prestigious scientific journals in biochemistry (determined by their impact
factors3): Cell (C), Molecular Cell (MC), Molecular and Cellular Biology (MCB),
Journal of Biological Chemistry (JBC), and Molecular Biology of the Cell (MBC).
From these five journals, 60 articles (12 from each journal) were randomly
selected, evenly distributed over all the issues of each journal for the year 2000.
These articles all have four distinct sections (Introduction, Methods, Results, and
Discussion). The total corpus size is about 320,000 running words.
3. The impact factor is the average number of times articles that are published in a specific
journal in the two previous years were cited in a particular year. This figure is useful in evalua-
ting a journals relative importance, especially when a comparison is made to other journals in
the same field.
3 Determining the move categories in the genre

of biochemistry research articles
The first step in the analysis here was to identify the move types that can occur in
each section of biochemistry research articles. This task was made easier because I
was able to build on the numerous previous studies that have identified move types
in research articles from different academic disciplines: Anthony (1999), Chu
(1996), Crookes (1986), Samraj (2002) on the move types in Introductions; Swales
& Luebs (2002), Wood (1982) on the move types in methodology sections; D.
Thompson (1993), Williams (1999) on the move types in Results sections; and
Dubois (1997) on the move types in Discussion sections.
Considering the findings from these studies, together with my own detailed
analyses of biochemistry research articles, I identified 15 move types that can oc-
cur in these texts. Several of these move types can consist of multiple sub-parts,
referred to as steps. Table 4.1 summarizes the overall framework.
Table 4.1 Model of move structure in biochemistry research articles
INTRODUCTION RESULTS
Move 1: Establishing a topic Move 8: Restating methodological issues
Move 2: Preparing for the present study: Step 1: Describing aims and purposes
Indicating a gap/raising a question Step 2: Stating research questions
Move 3: Introducing the present study Step 3: Making hypotheses
Step 1: Stating purpose(s) Step 4: Listing procedures or methodologi-
Step 2: Describing procedures cal techniques
Step 3: Presenting findings Move 9: Justifying methodological issues
Move 10: Announcing results
METHODS
Step 1: Reporting results
Move 4: Describing materials
Step 2: Substantiating results
Step 1: Listing materials
Step 3: Invalidating results
Step 2: Detailing the source of the materials
Move 11: Commenting results
Step 3: Providing the background of the ma-
Step 1: Explaining results
terials
Step 2: Generalizing/interpreting results
Move 5: Describing experimental procedures
Step 3: Evaluating results
Step 1: Documenting established procedures
Step 4: Stating limitations
Step 2: Detailing procedures
Step 5: Summarizing
Step 3: Providing the background of the pro-
cedures
Move 6: Detailing equipment
Move 7: Describing statistical procedures
DISCUSSION
Move 12: Contextualizing the study
Step 1: Describing established knowledge
Step 2: Generalizing, claiming, deducing previous knowledge
Move 13: Consolidating results
Step 1: Restating methodology (purposes, research questions, hypotheses, and procedures)
Step 2: Stating selected findings
Step 3: Referring to previous literature
Step 4: Explaining differences in findings
Step 5: Making overt claims/generalizations
Step 6: Exemplifying
Move 14: Stating limitations of the study
Move 15: Suggesting further research
The following sections describe the individual move types in each section and
their constituent steps.
3.1 The introduction section
Move 1: Establishing a topic assures that the topic is worth investigating and the
field is well established. Move 1 also reports previous research deemed relevant to
the topic being discussed. Move 1 usually begins the Introduction section, consist-
ing of topical statements of increasing specificity:
General Move 1 statement:
(1) Cell-cell adhesion is critical for tissues and organs. [C9]
Specific Move 1 statement:

(2) These modifications promote plasma membrane association and facilitate high-
affinity protein-protein interactions (REFERENCE). [MBC3]
Move 2: Preparing for the present study focuses on weaknesses in the existing lit-
erature and/or unaddressed research questions. Move 2 in biochemistry establish-
es a niche in previous research by the step of either indicating a gap or raising a
question, as shown in (34).
(3) Although these and other important roles of U2 snRNP are well known, the
critical issue of has not yet been determined. [MC5]
(4) , but it is not known whether they associate specifically with AJs. [C1]
Move 3: Introducing the present study is realized by three steps in this genre.
Step 1: Stating purpose(s) explicitly announces the purpose(s) of the study:
(5) It was undertaken to examine in detail and to try to understand .
[MCB3]
Step 2: Describing procedures focuses on the principal features of the study:

(6) We therefore investigated AJ formation in primary keratinocytes . [C1]
Step 3: Presenting findings announces the major findings of the study:

(7) Our results show that U2snRNP is associated with the E complex . [MC5]
3.2 The methods section
The methods section has four move types.

Move 4: Describing materials covers a wide range of materials used in biochemis-
try experiments, from natural substances, human/animal organs or tissues, to
chemicals. Move 4 can be realized as three variations.
Step 1: Listing materials explicitly itemizes materials or substances used:
(8) Bacterial strains used in this study are listed in Table 3. [C8]
Step 2: Detailing the source of materials identifies how these items are obtained,
such as by purchase, as a gift, etc.:
(9) COS-7 cells were obtained from S.Brandt . [MCB4]
Step 3: Providing the background of the materials includes the description, proper-
ties, or characteristics of the materials:
(10) All strains have GAL upstream activating sequence-regulated PGK1pG abd
MFA2pG genes, (REFERENCE). [MCB11]
Move 5: Describing experimental procedures has three variations or steps.

Step 1: Documenting established procedures recounts established experimental
processes commonly known to biochemistry researchers:
(11) Chromatin binding assays were performed as previously described (REFER-
ENCE). [MC4]
Step 2: Detailing procedures provides detailed description of not-so typical proce-

dures to facilitate the replication of subsequent studies:
(12) To obtain polyclonal antibodies , mice and rabbits were immunized .
[MBC9]
Step 3: Providing the background of the procedures justifies the choice of technique
or procedure:
(13) Complete details of all constructions will be provided upon request. [JBC10]
Move 6: Detailing equipment (14) and Move 7: Describing statistical procedures

(15) both occur infrequently in this genre:
(14) Images were recorded through a Hamamatsu C-2400 New vicon camera using a
10 x objective and brightfield optics. Video images were digitized at a rate of 6
frames/min as described above. [MBC8]
(15) The data were fitted to the Michaelis-Menten Equation 1 by using a non-linear
least squares approach and the kinetic constants+- S.E. [JBC7]
3.3 The results section
The results section also has four move types:

Move 8: Restating methodological issues focuses on how the data of the study have
been produced. This move is realized by one or more of four steps.
Step 1: Describing aims and purposes:
(16) To examine the kinetics , we first plated keratinocytes . [C1]
Step 2: Stating research questions:
(17) To determine whether these GTPases participate in the phagocytosis of P. aeru-
ginosa, we expressed . [JBC1]
Step 3: Making hypotheses:

(18) Mondo A and Mlx heterodimerize are predicted to bind CACGTG E-box se-
quences. [MCB12]
Step 4: Listing procedures or methodological techniques:

(19) (To determine whether ,) P19 cytoplasmic extracts were incubated . Reten-
tion of MondoA Mlx heterodimers on the DNA beads was determined by West-
ern blotting. [MCB12]
Move 9: Justifying procedures or methodology reveals what determines the scien-

tists decision to opt for particular experimental methods, procedures, or tech-
niques. This move can be expressed by referring to previous research.
(20) (DKO4 cells were used), in which mutant Ras had been detected homologous
recombination (REFERENCE) and a conditionally active Raf allele (EGFP-
Raf-1: ER) was stably expressed in these cells (REFERENCE). [C10]
Move 10: Announcing results is a crucial move of the Results section and is real-
ized by three steps. The first step reports major findings, whereas the second step
persuades the respective discourse community to consider the finding as a part of
consensual knowledge. The third step highlights the novelty produced by the study
that might be worth further investigation.
Step 1: Reporting results:
(21) Data is shown for Pse1ECFP/Nic96EYFP and Pse1ECFP/Nup188EYFP
(Figure 3). [MC1]
Step 2: Substantiating results:

(22) Similar results were obtained. [MC1]
Step 3: Invalidating results:

(23) (Full length VASP-GFP localized to adhesion zippers (Figures 6A-6D). This
was true in the majority of transfected cells .) In contrast, TD-GFP interfered
with formation of adhesion (Figures 6E-6H). [C1]
Move 11: Commenting on the results is one place where scientists not only report but
also comment on the results. Excerpts (2428) illustrate the five steps of Move 11:
Step 1: Explaining results:
(24) We presume that the localization of GFP-tagged Ste18p is representative of na-
tive Ste18p because the wild-type fusion protein rescues mating in a ste18 strain.
[MBC3]
Step 2: Generalizing/interpreting results:

(25) These results suggest that proteolysis of c-Myc is proteasome dependent.
[MCB4]
Step 3: Evaluating results:

(26) The strong exacerbation of the phenotype of fun12 (1915).. and the lack of any
effect in tif34 support our conclusion that eIF5B and eIF1A functionally in-
teract during translation initiation. Moreover, the toxicity is consistent with
the model that release of eIF1A and eIF5B from 805 initiation complexes is cou-
pled. [MCB10]
Step 4: Stating limitations:

(27) The molecular mechanisms are unknown. It is therefore difficult to propose
an explicit model to explain why telomeres become longer . [MC10]
Step 5: Summarizing:
(28) Together, these results demonstrate that reg A- cells are capable of assessing the
direction of a spatial gradient of cAMP . [MBC8]
3.4 The discussion section
The discussion section is also comprised of four possible move types:

Move 12: Contextualizing the study has two distinct steps.
Step 1: Describing established knowledge cites or reports related previous research
or established knowledge of the topic that is crucial in understanding what is being
presented:
(29) Conventional kinesin has long been suspected of being a vesicle motor. Initially,
this stemmed from its discovery in axoplasm (REFERENCE), which is rich in
Golgi-derived transport vesicles, and its co-localizatioin with vesicles in cultured
cells (REFERENCE). [MBC8]
Step 2: Generalizing, claiming, deducing previous knowledge describes how the

findings relate to the results of previous research:
(30) The observation that BAD is inactivated by phosphorylation atg Ser-155 has
important implications for the understanding of the regulation of Bcl-2 family
members. [MC7]
Move 13: Consolidating results highlights the strengths of the study and defends its
importance. This move is realized through six steps:
Step 1: Restating methodology:
(31) In this study, we exploited primary culture to examine the impact that elevated
K16 protein level has on a number of basic properties of skin keratinocytes.
[MBC10]
Step 2: Stating selected findings:

(32) We show that the essential Gpi11 and Gpi13 proteins are involved in late stages
in the formation of the yeast GPIs, and we identify and characterize three new
candidates GPI precursors. [MBC5]
Step 3: Referring to previous literature for comparison:

(33) The experiments presented here confirm the previously reported data (REFER-
ENCE), showing that . [JBC4]
Step 4: Explaining differences in findings:

(34) The advantages the Ku-X4-LIV complex confers upon ligation in vitro can there-
fore explain why these factors are required for cellular end joining: ligation is
fast and efficient, even at low enzyme and in the presence of unbroken
DNA. [MBC5]
Step 5: Making overt claims/generalizations:

(35) (Simply changing the CaaX motif to a form recognized by Ftase signifi-
cantly improved mGBP1 modification.) This result also indicates that the CaaX
motif is not likely to be buried within the structure of the protein, .
[MBC7]
Step 6: Exemplifying:
(36) (Within the G88R RNase A variants, cytotoxicity correlates well with conforma-
tional stability (Fig.2).) For example, A4/G88/V118C Rnase has the highest Tm
value of the five enzymes and is the most potent cytotoxin. [JBC6]
Move 14: Stating limitations of the present study makes explicit the scientists
views of the limitations of the study about the methodology, the findings, and/or
the claims made based on the findings:
(37) Additionally, some interactions may be too transient for detection by FRET.
[MC1]
(38) Our data do not enable us to rule out a requirement for additional, non-PMA-
activated pathways in the activation of splicing in primary T cells. [MBC1]
Move 15: Suggesting further research allows the scientists to offer recommenda-
tions for the course of future research by pinpointing particular research questions
to be addressed or improvements in research methodology:
(39) Further analysis of the molecular basis of motor axon guidance in the limb may
help to define two interrelated issues in the patterning of neuronal projections.
. [C7]
4 Coding moves in the corpus of biochemistry research articles
As mentioned in Chapter 2, the subjective nature of move identification presents a

methodological challenge for corpus-based research, which requires a systematic
identification and coding of all moves in the corpus (e.g., Crookes, 1986; Dudley-
Evans, 1994a; Paltridge, 1994). As a result, two individuals analyzing the same text
type may differ in ascribing move boundaries or in identifying the move type of
each move (as in the studies by Nwogu, 1997, and Williams, 1999, on the Results
section in medical research articles). Therefore, it was necessary to assess inter-
coder reliability of move assignment for the present project, ensuring that move
demarcation could be conducted consistently by different individuals, and that the
framework for determining move type could be applied reliably.
In the present case, I evaluated the reliability of my own coding in comparison
to the coding of an expert in the field of biochemistry: a PhD student at an Amer-
ican university who is also a faculty member in the School of Pharmacy at Silpa-
korn University in Thailand. Although the expert coder is not a native speaker of
English, he clearly possesses extensive experience and expertise in reading aca-
demic research articles in the field of biochemistry.
A two-hour training session for each section was conducted to explain the
purpose of the task and to acquaint the coder with the use of the analytical frame-
work (described in Section 2 and Table 4.1). Texts were segmented into moves,
and the move type of each move was determined. Only one rhetorical move was
ascribed to a segment of a text. Texts were not coded for steps. The list of steps
constituting each move was used to facilitate the coders decision in ascribing
moves; however, the step distinctions played no role in the subsequent analyses.
In the second stage of training, both raters coded four randomly selected texts
representing the four conventional sections. We then went through each text to
identify any coding disagreements. Difference in coding led to discussion and
clarification of the criteria for coding assignments.
Finally, the raters each independently coded 15 research articles (three articles
from each of the five journals). Based on the independent coding by the author
and the expert coder, inter-coder reliability was measured by agreement rate or
percentage agreement and kappa value. (Percentage agreement rate does not take
into account chance agreement between two coders, whereas kappa value does
(Orwin, 1994).
Table 4.2 Summarized results of inter-coder reliability analysis
Section Kappa Percent
Introduction .93 97.58

Methods .81 96.35
Results .88 93.02
Discussion .88 93.02
Average .89 95.03
Table 4.2 shows high overall inter-coder reliability as measured by both agreement
rates and kappa4 values. Moves in the Introduction section were more consistently
and reliably identified than those in the other sections. In contrast, the Methods
section displayed more divergence in move identification. However, there seemed
to be no systematic pattern regarding divergences in move coding. The findings
suggest the psychological reality of a move as a discourse unit that can be empiri-
cally investigated further.
5 Distribution of move types within texts from the biochemistry corpus
One major goal of move analysis is to identify the primary communicative func-
tion the move type of each statement in a text. Thus, when Introductions in
biochemistry research articles are described as being composed of three move
types, this means that every statement in the Introduction can be attributed to one
of these three types.
However, it is not the case that move types necessarily occur sequentially in a
text. For example, an Introduction will not necessarily be composed of sentences
belonging to Move Type 1 (Establishing a topic), followed by Move Type 2 (Prepar-
ing for the present study), followed by Move Type 3 (Introducing the present study).
Rather, these three move types, and their associated communicative functions, can
be interspersed throughout the Introduction. A move type represents a particular
communicative function, and a text often switches from one move type (commu-
nicative function) to another and then back again to the first. Each of these text
segments are coded as separate moves, resulting in the possibility of multiple
moves representing a single move type.
The following text excerpt illustrates how the language of an Introduction can
be attributed to different moves:
4. According to Fleiss (as cited in Orwin, 1994), the interpretation of Cohens kappa is sum-
marized as follows: k <.40 poor,.40 < k <.59 fair,.60 < k <.74 good, and k >.75 excellent
(40) Introduction [C6]

(Move Type 1) Small RNAs (sRNAs) in E. coli were first described in 1967. To
date, more than ten sRNAs are known to be encoded by the E. coli genome
(REFERENCE). These sRNAs act by mechanisms . The sRNAs regu-
late diverse cellular functions . (Move Type 2) Interestingly, however, the
function of E. coli 6S RNA has been unknown.
(Move Type 1) The 6S RNA was first detected as an abundant RNA (REF-
ERENCE). It is transcribed as part of a message that contains the gene en-
coding 6S RNA (ssrS) (REFERENCE). (Move Type 2) The mechanism of
processing 6S RNA has not been characterized. The function of the
protein also is not known . The lack of a reported phenotype has pre-
cluded finding a function for the 6S RNA. (Move Type 3) Here, we show that
the 6S RNA forms a complex with RNA polymerase.
(Move Type 1) Gram-negative bacteria enter stationary phase upon nutrient
limitation (REFERENCE).... (Move Type 3) Here, we show that 6S RNA binds
to the 70-holoenzyme form of RNAP .
The beginning of paragraph one contextualizes the study by general statements

introducing the topic of sRNAs (e.g., Small RNAs (sRNAs) in E. coli were first de-
scribed in 1967, Move Type 1), followed by a relatively more specific statement
regarding the role of sRNAs in regulating cellular functions. Then, the excerpt pin-
points the gap of previous research (the function of E. coli 6S RNA has been un-
known, Move Type 2)
The second paragraph similarly begins with Move Type 1, by citing a previous
study on 6S RNA (e.g., The 6S RNA was first detected as an abundant RNA); this is
coded as the third move in the introduction. The following sentence, which is the
fourth move in the Introduction, presents a second statement of the gap ( The
mechanism has not been characterized, Move Type 2). Then, the excerpt presents
a summary of findings from the present study (Here, we show that the 6S RNA
forms a complex with RNA polymerase, Move Type 3).
The last paragraph begins with providing background knowledge on another as-
pect of 6S RNA (Gram-negative bacteria enter stationary phase upon nutrient limita-
tion (REFERENCE), Move Type 1) and concludes with another statement of present
findings (Here, we show that 6S RNA binds to the 70-holoenzyme , Move Type 3).
The last paragraph begins with providing background knowledge on another
aspect of 6S RNA (Gram-negative bacteria enter stationary phase upon nutrient
limitation (REFERENCE), Move 1) and concludes with another statement of present
findings (Here, we show that 6S RNA binds to the 70-holoenzyme , Move 3).
The other article sections are structured in similar ways. The important point
to note here is that research articles are not structured as a series of moves. Rather,
move analysis allows the statements of a research article to be parceled out among
a closed set of communicative functions: the move types. The numbers assigned to
the moves in the coding framework (Table 4.1) reflect the order in which the
moves often appeared in these research articles. Similarly, the constituent steps of
each move are sequenced to reflect the common orders found in the corpus. How-
ever, this type of analysis is not intended to directly describe the discourse organi-
zation of texts: variation in the order of both moves and steps are possible, and it
is common to find multiple statements distributed throughout an article section
all belonging to a single move type.
Table 4.3 Overall distribution of the 15 move types

Section Frequency No. of No. of Words
of Observations (N = 315,667) Min/Max
Occurrence (N = 5,617) (4/490)
Introduction: (425 observations, 38,655 words)

Move 1: Establishing a topic 100.00% 264 29,243 9/341
Move 2: Preparing for the present study 66.66% 83 2,463 8/103
Move 3: Introducing the present study 100.00% 78 6,949 20/490
Methods (657 observations, 62,761 words)
Move 4: Describing materials 100.00% 110 4,036 4/132
Move 5: Describing experimental procedures 100.00% 525 57,694 5/420
Move 6: Detailing equipment 10.00% 6 271 11/96
Move 7: Describing statistical procedures 13.32% 16 760 6/134
Results (3,393 observations, 131,312 words)
Move 8: Stating procedures 95.07% 828 29,561 13/174
Move 9: Justifying methodological issues 71.59% 438 15,668 15/165
Move 10: Announcing results 100.00% 1,233 58,982 10/217
Move 11: Commenting results 91.01% 894 27,101 15/157
Discussion (1,142 observations, 82,939 words)
Move 12: Contextualizing the study 89.94% 431 29,730 12/303
Move 13: Consolidating results 100.00% 602 50,212 8/351
Move 14: Stating limitations of the study 80.00% 59 1,657 14/114
Move 15: Suggesting further research 53.33% 50 1,340 6/63
Using the coding framework described above, all moves in the corpus were identi-
fied and assigned to one of the 15 move types. Table 4.3 shows the overall distribu-
tion of move types across the 60 research articles that comprise the corpus. The
table also shows that the move types are not equally well represented in these re-
search articles. The move types differ in that some are obligatory5, some are op-
tional but normally present, while others are optional and rare.
The descriptions presented in this section demonstrate that Swales model for
describing the moves in Introductions was successfully extended to other well-
defined sections of biochemistry research articles. In this study, the move frame-
work described in Table 4.1 is successfully applied to the entire corpus of texts,
yielding a comprehensive description of all communicative purposes employed to
construct the research articles, the first goal of this study. In order to accurately
and thoroughly describe research articles, the typical linguistic characteristics of
each move type need to be identified, the second goal of the study. Multidimen-
sional analysis is used to empirically characterize the move types identified by
move analysis; the procedures and results for this stage of the study are described
in the following sections.
6 Linguistic characteristics of rhetorical moves

in biochemistry research articles
As described in Chapter 2, move analyses have typically had two primary goals: to
identify the major communicative purposes found in the texts from a genre the
move types, and to identify the individual moves that comprise particular texts
from that genre. Move analyses generally do not describe the linguistic character-
istics of move types. In part, this restriction is a consequence of the methods used
in previous research, which were based on analysis of a small number of texts from
the target genre. Such analyses did not provide the basis for generalizable findings
regarding the typical linguistic characteristics of move types. However, by extend-
ing this analytical approach to a representative corpus of texts, we are able to iden-
tify the typical linguistic patterns of variation among move types.
Multi-dimensional (MD) analysis was used in the present case to provide a
comprehensive linguistic description of the biochemistry moves and move types.
(MD analysis is introduced in Chapter 1 and described more fully in Appendix
One.) In a preliminary step, the corpus texts were further edited to facilitate quan-
titative linguistic analyses. For example, citations (e.g., Nose et. al, 1988) were re-
placed by Ref. to avoid artificial inflation of word counts. In addition, all refer-
ences to tables or figures were replaced by Pointer.
5. The cut-off frequency of 60% of occurrence was arbitrarily established as a potential measu-
re of move stability for any move posited in this study. A move occurring in 60% of the Intro-
duction sections in the corpus was considered an obligatory move. If a moves occurrence was
lower than 60%, it was considered optional.
As shown in Table 4.3 above, move analysis was undertaken to segment the 60
original research articles into 5,617 individual moves, which were each coded for
their move type. These moves ranged widely from 4 to 490 words, with an average
observation length of 56 words. For the MD analysis, move segments shorter than
25 words6 were excluded, because it is not possible to obtain reliable counts of
linguistic features in shorter segments. In addition, Move Type 6 and Move Type 7
were excluded from the MD analysis, because they were represented by only 4 and
11 observations, respectively. Thus, the corpus used for the MD analysis consisted
of 4,009 moves (comprising 287,607 words) with an average length of 71.8 words.
This corpus was tagged by the Biber tagger; the automatic tagging proved to
be 9899% accurate with no systematic tagging errors. A wide range of linguistic
features were counted in each move, including the range of lexico-grammatical
features used in previous MD analyses (see Appendix Two), as well as more spe-
cialized features that have been analyzed in scientific research articles: e.g., point-
ers, see Brett (1994); reference, see Swales (1990); Hyland (2000); and extraposed
it constructions, see Biber et al.(1999), Hewings & Hewings (2002). These fea-
tures were later collapsed into super-ordinate categories based on their similar
functions. For example, demonstrative adjectives and demonstrative pronouns
were aggregated into one demonstratives feature. Frequencies of the features in
each text were counted and normalized to a rate per 100 words, so that compari-
sons could be made across texts. The normalized frequencies of linguistic features
provide the basis for factor analysis.
MD analysis was used to identify the basic parameters of linguistic variation
among moves. In MD analysis, the distribution of many linguistic features is ana-
lyzed in each text of a corpus. Then factor analysis is used to identify the system-
atic co-occurrence patterns among those linguistic features the dimensions.
There are two major quantitative steps in an MD analysis: (1) identifying the sali-
ent linguistic co-occurrence patterns in a language; and (2) comparing texts and
genres/registers in the linguistic space defined by those co-occurrence patterns. In
the present case, once the dimensions of variation are identified, moves and move
types can be compared along each dimension.
Forty-one linguistic features had strong patterns of variation in this corpus
and were thus retained in the final factor analysis, which accounted for 33.5% of
the total variance. The solution for 7 factors was selected as optimal; Table 4.4
summarizes the co-occurring linguistic features grouped on each of these factors.
6. Based on a 25-word criterion, about 28% of the corpus would be excluded, leaving approxi-
mately two-thirds of the original corpus (71%). These observations comprise 91% of all words
in the original corpus and have an average observation length of 71.8 words.
Table 4.4 Summary of the linguistic features associated with each factor
Factor 1 Factor 5
Word length .750 All present tense verbs .720
All attributive adjectives .719 References .508
Common nouns .509 Type token ratio or TTR .392
Numerals -.522 (Common nouns -.405)
Technical jargon -.421 (Past tense verbs -.378)
(Pointers -.357)
Factor 2
(Prepositions -.325)
Passives .530
Past tense verbs .516 Factor 6
All coordinating conjunctions .361 To infinitives .764
Definite articles -.605 Whether/if .470
Nominalizations/gerunds -.536 To clause cont by verbs .447
Prepositions -.453 Person 1 .431
All modals -.328 To clause cont by adjectives .351
(Prepositions -.405)
Factor 3
(Type token ratio -.320)
Extraposed it .906
That clause cont by adjectives .857 Factor 7
Predicative adjectives .450 Concession .660
(To cl cont by adjectives .326)* Pointers .557
No negative features Not negation .545
All adverbs .538
Factor 4
No negative features
All demonstratives .899
Quantifiers .886
That claus cont by verbs .342
No negative features
* Features in parentheses are not used to compute dimension scores.
Each factor comprises a set of linguistic features that tend to co-occur in the moves
from the biochemistry corpus. Factors are interpreted as underlying dimensions of
variation based on the assumption that linguistic co-occurrence patterns reflect un-
derlying communicative functions. That is, particular sets of linguistic features co-
occur frequently in texts because they serve related communicative functions. In the
present study, the following interpretive labels are proposed for each dimension:
Dimension 1: Conceptual vs. Concrete Reference
Dimension 2: Concrete Action vs. Abstract Discussion
Dimension 3: Evaluative Stance
Dimension 4: Projected Interpretation
Dimension 5: Attributed Knowledge vs. Current Study
Dimension 6: Stated Purpose
Dimension 7: Contradictory Proposition
The following sections describe these interpretations and the multi-dimensional

characteristics of each move type.
7 Linguistic variation among move categories

in biochemistry research articles
To describe the multi-dimensional characteristics of each move type, it is first nec-

essary to compute factor scores for each move with respect to each factor. It is then
possible to compute and compare the mean factor scores for each move type on
each dimension.
Table 4A at the end of the chapter provides descriptive dimension statistics for
all move types, while Table 4.5 below shows the results of ANOVAs that test for
significant differences among the mean scores for each move type. Table 4.5 shows
that the differences among move types are statistically significant with respect to
all seven dimensions. However, the r2 values are not especially large, indicating
that there is also considerable linguistic variation among the moves within some
of these move types. For example, the mean dimension score of Move Type 4 on
Dimension 1 is -8.81, reflecting that the moves in this move type have low fre-
quencies of long words, attributive adjectives, and common nouns (the features
with positive loadings on Factor 1) and high frequencies of numerals and technical
jargon (the features with negative loadings on Factor 1). However, Move Type 4
actually shows a wide range of linguistic variation in the use of Dimension 1 fea-
tures, with a standard deviation of 14.5, and a total range of Dimension 1 scores
extending from -41.5 to 20.27 (See Table 4A). Thus some move types show a wide
range of internal linguistic variation, while other move types are relatively well
defined in their linguistic characteristics.
Table 4.5 ANOVA results on dimension score differences across 13 move types
Factor F Sig r2
Dimension 1 39.013 .000 45.2%

Dimension 2 94.124 .000 34.5%
Dimension 3 26.440 .000 8.1%
Dimension 4 64.409 .000 15.6%
Dimension 5 114.731 .000 16.0%
Dimension 6 103.316 .000 16.1%
Dimension 7 98.299 .000 17.4%
Overall, though, each of these dimensions identifies significant linguistic differ-

ences among the move types. The following subsections describe each dimension
in turn, providing a fuller functional interpretation, as well as discussion of how
the move types differ with respect to the dimension.
Dimension 1: Conceptual vs. Specific Reference

The positive-loading features on Factor 1 (see Table 4.4) include word length, at-
tributive adjectives, and common nouns. Word length, the highest loading feature
on Factor 1, refers to the average length of the words in a text measured in ortho-
graphic letters. The higher the average word length of a text is, the higher its infor-
mational density (Biber, 1988; Zipf 1949). Attributive adjectives allow scientists to
successfully describe, clarify, and qualify additional information about scientific
phenomena or entities, and common nouns are used generally to refer to entities
or concepts (Biber et al., 1999). The co-occurrence of these features reflects the
dense use of modified noun phrases and long (technical) words, resulting in high
information density.
The negative-loading features on Factor 1 are numerals and technical jargon
(abbreviations or acronyms used specifically in biochemistry writing). Whereas at-
tributive adjectives provide a more conceptual description of referents, numerals
provide a much more specific description, particularly regarding the exact quantity
of referents. The complementary distribution of these two features suggests that
they serve complementary functions: attributive adjectives provide conceptual elab-
oration, while numerals add rigorous explicitness for accurate and specific identifi-
cation of procedures, required for later replication (e.g., 46% of cell, 3 hrs at 4C).
The claim that specific reference is an important function underlying the
negative pole of Factor 1 is further supported by the high frequency of technical
jargon on this factor. In opposition to long words, technical jargon are abbreviated
terms used commonly in this discipline. For instance, the noun phrase posttran-
scriptional gene includes a long word (an attributive adjective), identifying a spe-
cific technical attribute that is relevant to a particular study. In contrast, the techni-
cal jargon term RNA is used widely in these articles as an abbreviated way to refer
to Ribonucleic Acid.
Although both related to referential information in scientific discourse, the
two poles of Dimension 1 reflect complementary functions. Using positive-load-
ing features, the description of nominal elements is relatively conceptual with a
high informational density. In contrast, greater precision and specificity of refer-
ence can be achieved by using the negative-loading features. Based on these con-
siderations, the interpretive label Conceptual vs Specific Reference is proposed for
the functional dimension underlying this factor.
Figure 4.1 shows the distribution of move types along Dimension 1. Move Type
2 (Preparing for the present study) has the highest Dimension score, characterized
by dense conceptual reference. The following example illustrates the linguistic fea-
tures associated with conceptual reference in this move type, including long words
(bolded), attributive adjectives (underlined), and common nouns (italicized).
(41) MOVE 2 (F1 score = 38.69)
While the cloned genes and mutant strains provide hints and useful tools for
future studies, direct biochemical roles for the genetically identified posttran-
scriptional gene silencing factors have yet to be assigned.
Move Type 2 prepares readers for the current study by identifying the research
gap: in this case, the absence of direct biochemical roles for the genetically identified
posttranscriptional gene silencing factors. This example illustrates the dense use of
long technical words, especially nouns and attributive adjectives, establishing the
need for a study by identifying the research gap.
In contrast, the linguistic features that characterize the other end of Dimen-
sion 1 reflect more specific and concise identification of referents. Move Types 4
and 5, both from Methods sections, have the largest negative scores here (see Fig-
ure 4.1). Example (42) from Move Type 4 (Describing materials) illustrates the
relative absence of positive Dimension 1 features and a high occurrence of nega-
tive features: numerals (bolded) and technical jargon (underlined).
(42) MOVE 4 (F1 score = -23.93)
For each pulldown, glutathione-agarose beads containing approximately 10 g
of bound purified GST-Nup501, GST-Nup502, or GST were used.
Example (42), focusing on specific rather than conceptual information, is dense

with numerals and technical jargon, describing the methods in a way that permits
the validation of the results and future replication. Examples (41) and (42) thus
represent the two contrasting communicative functions of this dimension. The
linguistic features defining both poles might be associated with informational
density; the main difference is that the negative features are associated with the
demands of precise reference rather than abstract conceptual information.
It is interesting that Move Type 5 (Describing procedures) and Move Type 8
(Restating methodological issues), despite their resemblance in terms of their com-
municative functions, are linguistically quite different on Dimension 1. Move Type
5, which occurs in Methods sections, has a relatively large negative score on Di-
mension 1, suggesting its greater precision and specificity of reference. In contrast,
Move Type 8 occurs in Results sections and has a moderate positive score along
Dimension 1, indicating mixed use of conceptual and specific reference. Overall,
the relationships among move types shown in Figure 4.1 confirm the interpreta-
tion of Dimension 1 as distinguishing among texts along a continuum of Concep-

tual vs. Specific Reference.
Dimension 2: Concrete Action vs. Abstract Discussion

Figure 4.2 presents the distribution of move types for Dimension 2, allowing com-
parison of all moves along a continuous parameter of variation labeled Concrete
Action vs. Abstract Discussion. Move Type 4 (Describing materials), which occurs in
Methods sections, has the largest positive dimension score. Passives, the highest
positive-loading feature on Factor 2, are used to identify where the research materi-
als were obtained. There is no need to identify the agent (obviously the researcher)
for these statements, and past tense is usually used to document these procedural
activities. Example (43) illustrates this move type, highlighting passives (bolded),
past tense verbs (underlined), and coordinating conjunctions (italicized):
(43) MOVE 4 (F2 score = 15.73)
Donkey anti-rabbit IgG-peroxidase and sheep anti-mouse IgG-peroxidase
were obtained from Amersham Life Science, and mouse anti-goat IgG-perox-
idase was from Jackson ImmunoResearch Laboratories.
At the other extreme, Move Type 15 (Suggesting further research) has the largest
negative dimension score. The negative-loading Factor 2 features include definite
articles, nominalizations/gerunds, prepositions, and modals. Definite article the
is used for noun phrases that have been previously evoked or are known to the
reader, while nominalizations/gerunds refer to abstract concepts. The co-occur-
rence of the definite article and nominalizations/gerunds on Factor 2 indicates the
authors focus on abstract information that the author constructs as given infor-
mation. Example (44) illustrates the relative absence of positive Dimension 2 fea-
tures and a high frequency of negative features, particularly definite articles (bold-
ed), nominalizations/gerunds (italicized), prepositions (capitalizations), and
modals (underlined).
(44) MOVE 15 (F2 score = -23.41)
A question arises AS TO how an integral membrane protein may be able to
interact WITH p38JAB1 and why this interaction occurs mostly WITH the
68-kDa precursor present IN the endoplasmic reticulum AS opposed TO the
85-kDa mature receptor present IN the plasma membrane. This issue will also
have to be addressed experimentally.
It is interesting to note that all move types that occur in Discussion sections (Move
Types 1215) have large negative scores on Dimension 2. In addition, all three
move types from the Introduction (Move Types 13) have large negative Dimen-
sion 2 scores. Thus we see here how these articles begin and end with relatively
abstract discussion, while the more concrete actions are described in the two in-
tervening sections.
Dimension 3: Evaluative Stance

The positive features on Dimension 3 are mostly that complement clauses. In these
constructions, the authors stance is given in the main clause, and the proposi-
tional information is given in the that complement clause (e.g., it is possible that we
did not detect). The heads of that complement clauses can be of different syntac-
tic categories (e.g., nouns, verbs, and adjectives). On Factor 3, the controlling
heads of that complement clauses are predicative adjectives. To be precise, the
adjectives controlling that complement clauses on Factor 3 are likelihood adjec-
tives (e.g., probable), attitudinal adjectives (e.g., interesting), and factual/certainty
adjectives (e.g., evident). This indicates that the co-occurring features on Factor 3
index the authors expression of their agreement, opposition, evaluation, and in-
terpretation of propositions. Similarly, to complement clauses are controlled by
predicative adjectives such as evaluative adjectives (e.g., appropriate, necessary)
and ease/difficulty adjectives (e.g., difficult, easy). Taken together, the positive-
loading Factor 3 features express authors personal stance towards the proposi-
tions in the that/to complement clauses. However, these constructions are imper-
sonal because their stance is not directly attributed to the authors. Based on these
interpretations, the interpretive label Evaluative Stance is proposed for the func-
tional dimension.
Figure 4.3 presents the distribution of move types for Dimension 3, allowing
comparison of all moves along a continuous parameter of variation labeled Evalu-
ative Stance. Move Type 14 (Stating limitations of the study) has the highest dimen-
sion score on Dimension 3, while Move Type 4 (Describing materials) has the low-
est mean score.
The Dimension 3 characteristics of Move Type 14 is illustrated by (45) and
(46), which contain frequent occurrences of extraposed it constructions (bolded),
that clauses controlled by adjectives (underlined), and to clauses controlled by ad-
jectives (capitalized).
(45) MOVE 14 (F3 score = 4.86)
It is interesting that the experiments in this paper were all carried out using
assays for genetic interference in somatic tissues of the animal in the first gen-
eration after injection. It is conceivable that distinct mechanisms might oper-
ate in longer term RNAi (REF.) or in specific tissues, such as the germline.
(46) MOVE 14 (F3 score = 4.62)

In the absence of atomic structure, it is not possible TO determine which resi-
dues are solvent exposed and thus are likely to make physical contact with the
microtubule and which ones contribute to the domains structural organization.
In contrast, the moves at the other end of the continuum of Dimension 3 show no
concern for evaluative stance. An example of Move Type 4 (Describing materials)
is represented by (47), with a complete absence of features on Dimension 3.
(47) MOVE 4 (F3 score = -1.02)
A peptide encoding a conserved region of the C-terminal domain of SMD
was used by Sigma Genosys to raise antisera SC1 and SC2, both were used at
1:500 on Westerns. An mSYD2 N-terminal domain fusion protein was used
by Lampire Biological Laboratories to raise the antiserum SN1, used at 1:500
on Westerns following depletion of the antiserum with an acetone powder of
bacteria expressing a portion of the antigen. -COP antibody was used at
1:5000 (Ref.). KLC 6390 antibody was used at 1:1000 (Ref.). MitoTracker
Red CM-H2XRos, and COX-1 antibodies were used as indicated by Molecu-
lar Probes. Golgi-58K antibody was used as indicated by Sigma. DIC antibody
was used as indicated by Chemicon International.SYN antibody was used as
indicated by Boehringer.
The preceding subsections have shown that research article Introductions (Move
Types 13) are very similar to Discussion sections (Move Types 1215) with re-
spect to being conceptual (Dimension 1) and abstract (Dimension 2). However,
Figure 4.3 shows a different pattern with respect to Dimension 3: the Discussion
sections are highly marked for evaluative stance, while the Introductions avoid
these stance expressions. This is a strong difference, with one major exception:
Move Type 12 (Contextualizing the study) is marked by the absence of the stance
features associated with Dimension 3. This move typically begins the discussion
section, functioning as a kind of recap of the article introduction. As such, it is
usually descriptive rather than evaluative, providing the immediate background
for the following evaluative moves in the Discussion section. With respect to the
larger goals of this book, this finding provides a nice example of how a move ap-
proach to discourse organization captures a linguistic pattern that would probably
go un-noticed otherwise. That is, while Discussion sections are generally evalua-
tive in stance, a move analysis of this section shows that the first move is usually
quite different in both its communicative functions and its associated linguistic
characteristics.
Dimension 4: Projected interpretation

The linguistic features associated with projected interpretations (Dimension 4) are
demonstratives, quantifiers, and that clauses controlled by verbs. The frequent use
of quantifiers, in conjunction with demonstratives, reflects the concern with the
specificity of textual reference in academic discourse. That complement clauses
controlled by verbs provide a means to talk about the information in the depend-
ent that complement clause. In this scientific discourse, the subject of these verbs
is usually an inanimate entity (e.g., the findings suggest that ). The frequent use
of these verbs represents the authors expression of degree of certainty or commit-
ment associated with the claim stated in the that complement clause. For instance,
demonstrate denotes a higher degree of certainty than suggest when used in the
context of the findings demonstrate/suggest that. The interpretation of the features
underlying Factor 4 predicts that the moves with the highest Dimension 4 score
will be more characterized by projected interpretations than other moves.
Figure 4.4 presents the distribution of move types along Dimension 4. Move
Type 11 (Commenting results) has the highest dimension score on Dimension 4,
while Move Type 5 (Describing experimental procedures) has the lowest mean score
on this dimension. Example (48) illustrates Move Type 11 at the positive end of
Dimension 4. The text shows the concern for expressing claims or generalizations
of the texts with a frequent use of demonstratives (bolded), quantifiers (under-
lined), and that clauses controlled by verbs (italicized).
(48) MOVE 11 (F4 score = 24.56)
All these data suggest that recombinant mammalian retromer proteins can
form complexes in COS7 cells. These data, however, do not demonstrate wheth-
er the complexes are formed in the cytoplasm, on membranes, or both.
In contrast, (49) from Move Type 5 (Describing experimental procedures) reveals

an absence of positive-loading features of Dimension 4.
(49) MOVE 5 (F4 score =.40)
The MatchmakerTM two-hybrid system 2 was used according to the proto-
cols provided by the manufacturer. Using polymerase chain reaction-based
strategies, we subcloned the C-terminal 42 residues of the rLHR into the
pAS21 vector to generate a fusion protein with the GAL4 DNA binding do-
main. This plasmid was used as bait to screen a human kidney 293 cells
cDNA library constructed in the pACT2 vector to generate fusion products
with the GAL4 activation domain.
As shown, the move in (49) provides an objective description of experimental pro-

cedures. No interpretation is involved; and so no claims are framed. Overall, the
relationships among moves shown in Figure 4.4 confirm the interpretation of Di-
mension 4 as distinguishing among texts along a continuum of projected interpre-

tations. Discourse with a focus on expression of generalizations has a high score on
this dimension, and discourse with no focus on making generalizations has a
markedly low score on this dimension.
Dimension 5: Attributed Knowledge vs. Current Study

The linguistic features associated with attributed knowledge (the positive end of
Dimension 5) are present tense verbs, attributed references, and type-token ratio
(TTR7). The co-occurrence of references to other studies with present tense verbs
indexes generalized background knowledge established by previous research in
the field. High type/token ratio in a text indicates that the discourse has a greater
variety of word types and integrates a higher amount of information (Biber, 1988).
Taken together, the co-occurrence of positive-loading Factor 5 features reflects a
focus on attributed knowledge, a crucial requirement in scientific discourse to
situate and contextualize the study being reported.
In contrast, the linguistic features associated with the current study (the nega-
tive pole of Dimension 5) are common nouns, pointers, past tense verbs, and prep-
ositions. Three of the four negative-loading Factor 5 features (common nouns,
past tense, and prepositions) load higher on Factors 1 and 2. As discussed earlier,
common nouns refer to entities or concepts; past tense verbs report completed
actions and do not assume generalization; and prepositions often modify and
specify nouns in a discourse. Pointers (e.g., see Figure 3, as shown in Table 6) direct
readers to visual representations accompanying the data presented. The co-occur-
rence of these negative-loading Factor 5 features suggests the focus on reporting
the actual findings of the current study.
Figure 4.5 presents the distribution of move types for Dimension 5. Move
Type 1 (Establishing a topic) has the highest positive dimension score, character-
ized by reference to attributed knowledge; Move Type 10 (Announcing results) has
the largest negative mean score, characterized by its focus on current findings.
Move Type 1 (Establishing a topic), represented by (50), contains frequent oc-
currences of present tense verbs (bolded), references (underlined), and type-token
7. TTR is the ratio between the number of different lexical items in a text (the types) and the
total number of words in that text. Specifically, TTR is a percentage = (types/token) X 100. Lon-
ger texts tend to have more repeated words and thus a much lower TTR. If the TTR in the text is
low, it means there are many more repeated words in the text. Conversely, if the type-token ratio
is high in the discourse, it means that the text has fewer repeated words and greater lexical den-
sity. In this study, all move observations under 25 words were excluded, and the 4,009 move ob-
servations analyzed by multidimensional analysis has a range of 25 to 483 words, with a mean of
72 words. In this regard, given a broad range of move observation length, TTR is not going to be
comparable here because the ratio always decreases as longer texts are included, and vice versa.
ratio, and it has markedly low frequencies of common nouns, pointers, past tense
verbs, and prepositions. Thus, (50) is typical of Move Type 1 in being highly at-
tributed in knowledge, providing background information of the field rather than
reporting findings from the current study.
(50) MOVE 1 (F5 score = 13.34) (TTR= 27.25)
Interest in prenylation has stemmed from the discovery that key proteins in
multiple signal transduction cascades contain covalently attached isoprenoids
(Ref.). Perhaps the most notable examples are the Ras proteins. Mutated forms
of Ras proteins are found in 30% of all human tumors (Ref.). However, these
mutant Ras proteins are not oncogenic if they cannot be prenylated (Ref.).
Prevention of Ras prenylation thus holds promise as a new tactic for cancer
chemotherapy (Ref.). To this end, many prenylation inhibitors have been de-
veloped, several of which appear to be effective anticancer agents in animal
studies and are undergoing clinical trials (Ref.).
Conversely, Move Type 10 (Announcing results), represented by (51), is character-

ized by frequent occurrences of common nouns (italicized), pointers (underlined),
past tense verbs (bolded), and prepositions (capitalized), and relatively infrequent
use of present tense verbs, references, and a relatively low type-token ratio.
(51) MOVE 10 (F5 score = -36.66) (TTR = 14)
Moreover, these troughs were labeled WITH antibodies against -catenin
(Pointer) and were flanked BY desmosomes associated WITH thick bundles
OF keratin intermediate filaments (Pointer). AT late times, the undulating
cellcell border had flattened, and the epithelium appeared AS a sheet, with
continuous contacts OF alternating desmosomes and Adherens junctions
(Pointer).
Similar to Dimension 3, the distribution of move types along Dimension 5 shows
the importance of a move-analytical approach. In this case, only one move type
Move 1, which is usually the very first move in an article is especially marked for
the use of attributed knowledge features. In contrast, all other move types includ-
ing the other two move types found in the article Introductions are marked by
the focus on current findings (to differing extents). The interesting finding here is
that these research article Introductions are not at all homogeneous. Rather, the
first move in the Introduction is functionally distinctive, serving the communica-
tive purpose of establishing the topic, and the MD analysis shows that this move
type is linguistically distinctive as well.
Dimension 6: Stated Purpose

Dimension 6 is composed mostly of only features with positive loadings. Whether/if
indirect questions introduce independent yes/no interrogative clauses expressing
indirect questions as indirect speech reports. First person pronouns reflect the active
role of the authors. The co-occurrence of first person pronouns, infinitive marker to,
and whether/if indicates the authors deliberate purpose of addressing intellectual
research questions and constructing relevant strategies to answer those questions.
The other two positive-loading Factor 6 features are two types of to comple-
ment clauses, controlled by verbs and controlled by adjectives. The controlling
verbs of to complement clauses on this factor are modality/causation/effort verbs
(e.g., attempt, try, seek), while the controlling adjectives are ability and willingness
adjectives (e.g., able, determined, sufficient). Both modality/causation/effort verbs
and ability and willingness adjectives index the authors expression of specific and
definite objective(s) of the study. (The two negative-loading Factor 6 features,
prepositions and type-token ratio, are not unique to Factor 6 because of their sali-
ent loadings on other factors.) All in all, based on the interpretation of the com-
municative functions represented by positive-loading features on Factor 6, the
interpretive label stated purpose is proposed for this dimension.
Figure 4.6 presents the distribution of move types for Dimension 6. Move Type
15 (Suggesting further research) has the highest dimension score, while Move Type
5 (Describing experimental procedures) has the lowest mean score. The interpreta-
tion of the features comprising this factor predicts that moves with the highest Di-
mension 6 score will be more characterized by purposive statements than others.
According to Figure 4.6, Move Type 15 (Suggesting further research) and Move
Type 14 (Stating limitations of the study), represented by (5253) and (54) respec-
tively, serve purposive functions, reflected by frequent occurrences of to infinitives
(italicized), whether/if (underlined), to complement clauses controlled by verbs,
first person pronouns (capitalized), and to complement clauses controlled by ad-
jectives (bolded).
(52) MOVE 14 (F6 score = 1.59)
However, because the neural tube becomes disordered in Nup50 mutant ani-
mals, it is difficult to determine if the alterations in p27Kip1 expression in
Nup50-null animals are the cause or the consequence of the neural tube abnor-
malities. A mechanistic understanding of these abnormalities thus must await a
clearer understanding of Nup50 function in nucleocytoplasmic transport.
(53) MOVE 14 (F6 score = 2.28)
However, WE were unable to detect any morphological changes in duct cells
consistent with this hypothesis, although WE cannot rule out the possibility
that functional changes have occurred that are unapparent morphologically.
(54) MOVE 15 (F6 score = 3.14)

Future experiments will be necessary to ascertain whether a similar mecha-
nism is involved at puncta, or whether the physical interaction between vin-
culin or zyxin and Adheren junction components are alone sufficient to pro-
mote VASP/Mena association and activation.
In contrast, the absence of the linguistic features that characterize the positive end
of Dimension 6 shows less or no use of purposive statements, as shown in (55)
from Move Type 5 (Describing procedures).
(55) MOVE 5 (F6 score = -41.87)
Primer extension assays were performed as described previously (Ref.) using
22/44 and 24/44 primer-templates with or without 16- or 14-mer downstream
oligonucleotides (Pointer). 200 fmol of primer-template was incubated with pol
at 37 C in 10-l reactions containing 500 M dNTPs unless otherwise indi-
cated. Reaction times and enzyme concentrations are indicated in the figures.
Dimension 7: Contradictory proposition

Finally, Dimension 7 is composed of only four co-occurring linguistic features.
Concessive markers and not negation are semantically transparent in their func-
tion of negating a proposition. The meta-textual device of pointers (e.g., See Table
1, Figure 3, etc.) directs readers attention to visual accompaniments of particular
findings. And adverbs often provide qualifications of propositions. The functions
of these linguistic features taken together contribute to the pragmatic function of
expressing contradiction.
Figure 4.7 presents the mean dimension score of each move type along Di-
mension 7. Move Type 2 (Preparing for the present study) has the highest dimen-
sion score on Dimension 7, while Move Type 4 (Materials used) has the lowest
mean score. Move Type 2 represented by (56) demonstrates the positive features of
Dimension 7: concessive markers (bold), pointers (capitalized), not negation (un-
derlined), and adverbs (italicized).
(56) MOVE 2 (F7 score = 13.44)
However, forms of Ras that are incompletely modified have received little
study (Ref.), largely because of the assumption, based on direct physical stud-
ies, that prenyl proteins are fully and completely modified (POINTER). It is
still not known if all functions of oncogenic Ras require prenylation or if some
effector pathways may remain active regardless of prenylation state.
In contrast, (57) representing Move Type 4 shows little or no concern with ex-
pressing contradiction, or with the logical comparison of possibilities at all, as re-
flected by the absence of positive-loading features on Factor 7.
(57) MOVE 4 (F7 score =.55)

Human kidney 293T cells are a derivative of 293 cells that express the SV40T
antigen (Ref.) and were provided to us by Dr. Marlene Hosey. HeLa cells were
obtained from Dr. Dawn Quelle. The 9E10 hybridoma cell line was obtained
from the American Type Culture Collection. Purified hCG was kindly pro-
vided by the National Hormone and Pituitary Agency. 125I-hCG was pre-
pared using the purified hCG as described elsewhere (Ref.).
Overall, the relationships among moves shown in Figure 4.7 confirm the interpre-
tation of Dimension 7 as distinguishing among texts along a continuum of Contra-
dictory proposition.
Overall Multi-Dimensional Profile of Move Types

Figure 4.8 summarizes the overall relations among four of the move types from
biochemistry research articles: Move Type 2: Preparing for the present study, Move
Type 5: Describing procedures, Move Type 8: Restating methodological issues, and
Move Type 14: Stating limitations of the study (selected moves from the Introduc-
tion, Methods, Results, and Discussion sections, respectively).8
Move Type 2 (Preparing for the present study) has the highest mean scores on
Dimensions 1 and 7, reflecting a relatively marked emphasis on conceptual infor-
mation (Dimension 1) and a focus on expression of contradiction (Dimension 7).
Move Type 2 is unmarked with respect to Dimensions 26, suggesting a mixed
focus of concrete and abstract discussion (Dimension 2), a moderate use of evalu-
ative stance (Dimension 4), a mixed focus on attributed information and current
findings generated by the study (Dimension 5), and a moderate use of purposive
statement (Dimension 6).
Move Type 5 (Experimental procedures) stands out as having markedly low
scores on Dimensions 4 and 6, showing no concern for framing claims or generali-
zations (Dimension 4) or expressing purposive statements (Dimension 6). Move
Type 5 also has relatively low scores on Dimensions 1, 3, and 7, indicating a moder-
ate use of specific reference (Dimension 1), and little concern for expressing evalu-
ative stance (Dimension 3) and contradiction (Dimension 7). This move type has a
moderately high mean score on Dimension 2, showing a relatively high emphasis
on scientific activities. Move Type 5 has an intermediate mean score on Dimension
5, suggesting a mixed use of attributed information and current findings.
Move Type 8 (Restating methodological issues) has a markedly low score on
Dimensions 5 and a moderately low score on Dimension 7, suggesting the empha-
8. In order to facilitate comparisons across the seven dimensions, each of the dimension scores
was transformed to a new scale. That is, dimension scores were multiplied/divided by a scaling
coefficient so that all dimensions are presented within the same scale from plus to minus 10.
sis on current findings (Dimension 5) and little concern of contradiction (Dimen-

sion 7). Move Type 8 has intermediate scores on Dimensions 1, 2, 3, 4, and 6, indi-
cating that this move has a mixture of both conceptual and specific reference
(Dimension 1), both concrete and abstract Discussion (Dimension 2), a certain
amount of evaluative stance (Dimension 3), generalizations (Dimension 4), and
purposive statements (Dimension 6).
Move Type 14 (Limitation of the study) has a markedly high score on Dimen-
sion 3, suggesting a high density of evaluation (Dimension 3). This move has mod-
erately high mean scores on Dimensions 4, 6, and 7, indicating a substantial amount
of generalizations (Dimension 4), purposive statements (Dimension 6), and con-
tradictory statements (Dimension 7). Move Type 14 has a moderately low mean
score on Dimension 2, suggesting relative focus on abstract concepts rather than
concrete research activities. Move Type 14 has intermediate scores on Dimensions
1 and 5, indicating that this move has a mixed characteristic of both conceptual and
specific reference (Dimension 1), and both attributed knowledge and current find-
ings (Dimension 5). As shown in Figure 4.8, each move type has a different profile
across the seven dimensions. As a consequence, the relation between any two moves
needs to be based on consideration of all seven dimensions.
The multidimensional profile is also useful in differentiating two different
moves that are superficially emblematic of similar functions but belong to different
sections of research articles. Move Types 2 and 14 illustrate such differences. Move
Type 2 (Preparing for the present study) from the Introduction section generally
contains the scientists evaluation of existing research in order to justify the need
for the current study. Move Type 14 (Stating limitations of the study) from the Dis-
cussion section, expresses the scientists evaluation of their own study, acknowl-
edging that their study is not perfect, and limitations are imminent. Thus, the two
moves are functionally similar to a certain extent: both moves involve evaluation.
This raises the question of whether these two moves should be labeled as two dis-
tinct move types, simply because they belong to different sections and interact
with their neighboring moves differently. Or, instead, should these two function-
ally similar moves be considered the same move type that occurs in two different
sections? To address this question, a look at representative text samples and the
multidimensional profiles of these moves is informative.
(58) MOVE 2
These findings raise the possibility that the differential expression of genes by
lateral motor column and lateral motor column neurons and by ventral and
dorsal limb mesenchymal cells coordinates this binary choice in motor axon
trajectory.
(59) MOVE 14
It is interesting that the experiments in this paper were all carried out using
assays for genetic interference in somatic tissues of the animal in the first gen-
eration after injection. It is conceivable that distinct mechanisms might oper-
ate in longer term RNAi (REF.) or in specific tissues, such as the germline.
Examples (58) of Move Type 2 and (59) of Move Type 14 show that they are stylisti-
cally different, and this stylistic difference can be captured by multidimensional
analysis. Based on Figure 4.8, Move Types 2 and 14 are similar on Dimensions 2, 4,
5, 6, and 7. However, they are different to a certain extent on Dimensions 1 and 3.
That is, on Dimension 1, Move Type 2 has the highest score, and Move Type 14 has
a relatively high mean score (Conceptual vs. Specific Reference), suggesting that Move
Type 2 focuses on a conceptual current state of knowledge of the topic, whereas
Move Type 14 narrows down the scope of the reference to the current study only,
and thus is relatively more specific with regard to referential information.
The difference between the two move types is more distinct with respect to
Dimension 3 (Evaluative stance): Move Type 14 has the highest mean score, where-
as Move Type 2 has a negative mean score, reflecting much more use of evaluative
stance features in Move Type 14 than in Move Type 2. This difference in mean
scores indicates that in Move 14, when the scientists present evaluations of their
own study, they tend to background their identification by using the positive Di-
mension 3 features: extraposed it, predicative adjectives, and that or to comple-
ment clauses controlled by evaluative adjectives. In contrast, the negative mean
score of Move Type 2 on Dimension 3 indicates little use of these features when the
scientists comment on previous research. Despite their similar function of ex-
pressing evaluation as determined by move analysis, Move Types 2 and 14 are in-
deed stylistically distinct on the parameter of Dimension 3: Evaluative stance.
8 Multi-dimensional variation among move types within the same section
It is known that a research article typically has a well-defined four-section organi-

zation, known as IMRD (Introduction-Methods-Results-Discussion). Each sec-
tion is clearly marked for its distinct communicative purpose by the names of the
section. As shown in the present study, each section also consists of a number of
rhetorical moves. For instance, the Introduction section consists of three main
moves (Move Types 13).
The moves from a section tend to pattern together on most dimensions, lead-
ing to the conclusion that the sections are linguistically uniform. For example,
Figure 4.9 shows that the three move types within Introductions are very similar
in their characterizations with respect to most dimensions. However, there are also
differences. For example, Move Type 2 (Preparing for the present study) has a mark-
edly high dimension score on Dimension 7, indicating a focus on expressing con-
tradiction. Similarly, Move Type 1 (Establishing the topic) has a high score on Di-
mension 5, indicating a high use of attributed knowledge in order to establish the
centrality of the topic being presented. These distinctive characteristics of Move
Types 1 and 2 demonstrate that Introduction sections are not homogenous. Al-
though these three move types are related in their communicative functions and
linguistic MD characterizations, they also each have distinct micro-purposes and
linguistic characteristics.
Figure 4.10 plots the mean scores for the two main move types within Meth-
ods sections, showing that Move Type 4 (Describing materials) and Move Type 5
(Describing procedures) are relatively uniform linguistically. That is, although there
are some minor differences between these two move types, they are consistently
similar in their multi-dimensional characteristics when compared to the other
move types. In this case, we can identify two distinct communicative purposes
within Methods sections, but those functions are related to each other, and there
are few linguistic differences between these move types.
Figure 4.11 similarly shows that the four move types of the Results section
(Move Types 811) are very similar in their linguistic characteristics. That is, all of
the four Results move types have intermediate scores on Dimensions 1, 2, 3, 6, and
7. However, there are also some differences: Move Type 10 (Stating results) has an
extremely large negative score on Dimension 5 (Attributed knowledge vs. Current
study), indicating a focus on the current study; and Move Type 11 (Commenting on
the results) has a markedly large positive score on Dimension 4 (Projected interpre-
tation), indicating a focus on making interpretations.
This same general pattern of linguistic homogeneity holds for the moves with-
in Discussion sections (Move Types 1215; see Figure 4.12). However, here also
there are some differences. For instance, Move Type 14 (Stating limitations) has an
extremely large positive score on Dimension 3 (Evaluative stance), indicating a
high density of evaluative stance. In contrast, Move Type 12 (Contextualizing the
study) has a negative score on this dimension, reflecting the absence of evaluative
stance expressions.
These patterns demonstrate the value of the novel application of multidimen-
sional analysis in analyzing move types, reflecting the internal discourse organiza-
tion of research articles. Without this level of fine grain analysis, these linguistic
differences might not have been noticed. Thus, this study illustrates the power of
integrating these two approaches to discourse analysis: move analysis and multi-
dimensional analysis. Each approach provides a partial analysis of the discourse.
Without the initial move analysis, the communicative variation within sections
would have been missed. Similarly, without the multi-dimensional analysis, the
linguistic characteristics of rhetorical move types would not have been systemati-
cally characterized, and moves with similar functions would not have been dif-
ferentiated linguistically. The study thus shows the complementary nature of these
two steps in the analysis for characterizing textual variation at both macro and
micro levels.
Figure 4.1 Mean scores of moves along Dimension 1: Conceptual vs. Specific Reference
Figure 4.2 Mean score of moves along Dimension 2: Concrete Action

vs. Abstract Discussion
Figure 4.3 Mean score of Moves along Dimension 3: Evaluative Stance

Figure 4.4 Mean score of Moves along Dimension 4: Projected interpretation

Figure 4.5 Mean score of Moves along Dimension 5: Attributed Knowledge

vs. Current Study
Figure 4.6 Mean score of Moves along Dimension 6: Stated Purpose

Figure 4.7 Mean score of Moves along Dimension 7: Contradictory Proposition

Figure 4.8 Multidimensional profiles of moves, highlighting Move 2 (Preparing for the
present study), Move 5 (Describing experimental procedures), Move 8 (Restating meth-
odological issues), and Move 14 (Stating limitations of the study)
Figure 4.9 Multidimensional profile of the Introduction moves: Move 1 (Establishing a

topic; dashed line), Move 2 (Preparing for the present study; dotted line), and Move 3 (In-
troducing the present study; solid line)
Figure 4.10 Multidimensional profile of Methods moves: Move 4 (Describing materials;

dotted line) and Move 5 (Describing experimental procedures; dashed line)
Figure 4.11 Multidimensional profile of the Results moves: Move 8 (Restating methodo-
logical issues), Move 9 (Justifying methodological issues), Move 10 (Announcing results),
and Move 11 (Commenting results)
Figure 4.12 Multidimensional profile of the Discussion moves: Move 12 (Contextualizing

the study), Move 13 (Consolidating results), Move 14 (Stating limitations of the study), and
Move 15 (Suggesting further research
Table 4A Descriptive dimension statistics for all move types
Move Observation Mean S.D. Minimum Maximum
Dimension 1 1 251 9.9100 10.2693 22.60 49.42

2 45 11.8939 13.2279 14.89 46.06
3 65 10.4042 11.3340 14.78 42.95
4 60 8.8137 14.4933 41.47 20.27
5 485 4.1863 11.1414 36.90 32.54
8 550 4.9533 12.9123 39.04 44.58
9 252 7.1886 12.5910 30.68 43.59
10 911 4.8427 12.4696 48.02 41.97
11 449 6.3406 12.2484 29.18 44.83
12 361 6.8747 11.2302 25.00 43.16
13 532 6.0781 11.0062 26.21 38.99
14 27 7.6219 11.5713 21.20 29.69
15 21 9.7121 10.4349 13.32 29.78
Total 4009 4.7541 12.6066 48.02 49.42
Dimension 2 1 251 17.5677 7.1427 42.32 4.36

2 45 17.7664 9.3825 38.49 2.09
3 65 18.9329 5.6777 30.47 8.14
4 60 .8460 10.5354 32.48 21.02
5 485 5.8697 8.2884 39.50 18.25
8 550 13.0989 9.3133 40.47 13.25
9 252 16.7269 9.6283 38.50 15.79
10 911 14.0587 9.2742 47.48 15.89
11 449 18.6941 8.9365 42.86 7.21
12 361 18.5847 8.3080 46.99 4.37
13 532 20.1220 7.3091 49.48 4.36
14 27 20.1287 10.4172 48.69 1.35
15 21 21.6619 11.1997 43.78 1.19
Total 4009 15.0587 9.8304 49.48 21.02
Dimension 3 1 251 .4584 .9535 1.02 4.99

2 45 .3413 2.2196 1.02 11.88
3 65 .6987 .5571 1.02 .90
4 60 .8928 .4586 1.02 1.15
5 485 .8472 .5474 1.02 3.14
8 550 .4604 1.3653 1.02 10.98
9 252 .1337 1.8041 1.02 10.08
10 911 .3474 1.3442 1.02 9.62
11 449 .5721 2.4746 1.02 12.30

12 361 .2526 1.4341 1.02 10.74
13 532 .1682 1.8192 1.02 11.49
14 27 1.7880 3.5792 1.02 8.67
15 21 .7187 3.2546 1.02 9.80
Total 4009 .2308 1.6322 1.02 12.30
Dimension 4 1 251 1.0425 2.5298 2.20 13.19

2 45 3.0223 4.4581 2.20 15.65
3 65 2.2735 2.7794 2.20 12.08
4 60 .6286 3.0781 2.20 15.98
5 485 .8221 1.8644 2.20 9.22
8 550 .4196 3.2853 2.20 13.80
9 252 1.3391 3.5941 2.20 19.23
10 911 1.1888 3.2782 2.20 18.38
11 449 4.4347 4.7652 2.20 23.72
12 361 1.6635 2.8445 2.20 13.80
13 532 2.7587 3.4469 2.20 20.02
14 27 2.7309 4.4515 2.20 13.44
15 21 1.4118 3.7564 2.20 9.22
Total 4009 1.4774 3.6339 2.20 23.72
Dimension 5 1 251 2.6428 11.9913 47.97 34.76

2 45 6.2822 11.3331 34.35 16.85
3 65 5.2462 15.6066 57.70 23.41
4 60 9.7019 10.3624 34.76 14.62
5 485 2.6463 11.9192 37.38 32.84
8 550 16.2223 11.1704 52.99 14.05
9 252 9.3463 11.1193 41.88 21.82
10 911 17.5364 11.2713 51.00 15.40
11 449 10.2037 11.8279 53.62 18.11
12 361 1.8498 12.5951 41.16 38.74
13 532 2.5816 12.4715 38.41 34.01
14 27 5.6342 10.6608 27.32 18.33
15 21 8.0764 11.3309 32.50 19.45
Total 4009 8.9857 13.6489 57.70 38.74
Dimension 6 1 251 26.5783 8.3568 56.88 3.51

2 45 15.0389 6.8015 29.87 10.18
3 65 21.8115 9.8892 44.22 .23
4 60 14.7642 6.5455 26.60 .42

5 485 27.3203 9.9091 56.42 1.12
8 550 14.0081 6.7782 36.09 10.82
9 252 17.3601 6.9605 32.02 5.72
10 911 19.6434 6.5368 38.56 4.43
11 449 16.1855 6.2292 32.61 2.24
12 361 21.3855 8.6880 52.00 .75
13 532 22.2919 8.7855 47.89 4.72
14 27 12.5584 6.8255 24.91 2.28
15 21 11.5088 7.6612 21.12 3.14
Total 4009 20.0308 8.8577 56.88 10.82
Dimension 7 1 251 .6756 2.2510 2.35 10.17

2 45 4.2911 4.0254 2.35 15.50
3 65 .1782 2.1534 2.35 6.64
4 60 .8970 2.1400 2.35 5.99
5 485 .8082 1.4866 2.35 6.16
8 550 .6487 2.3168 2.35 11.94
9 252 .4396 3.1371 2.35 15.83
10 911 3.6772 3.8692 2.35 19.47
11 449 1.9431 3.7519 2.35 17.65
12 361 .9993 2.7780 2.35 16.41
13 532 1.7876 2.9378 2.35 15.51
14 27 2.8591 3.3072 2.35 9.19
15 21 1.0211 3.0093 2.35 8.00
Total 4009 1.3258 3.4480 2.35 19.47
chapter 5
Rhetorical appeals in fundraising
WITH Molly Anthony & Kostyantyn Gladkov
This chapter presents another top-down approach to analyzing discourse

structures in fundraising letters: rhetorical persuasion analysis. As outlined in
Chapter 1 (see Table 1.1), the first step in a top-down corpus-based analysis
of discourse organization is to determine the set of possible functional/
communicative categories of discourse units. Instead of describing texts
according to their semantic/functional purposes, as in move analysis, rhetorical
persuasion analysis divides texts into sections using the three basic means of
Aristotelian persuasion: ethos, pathos and logos.
This chapter gives background about rhetorical persuasion, describes the set
of rhetorical appeals, applies the text analysis to the ICIC corpus of fundraising
letters, and attempts to describe the sequencing and location of rhetorical
appeals in relation to rhetorical moves, explained in Chapter 3.
1 Elements of persuasion
Persuasion has been a topic of interest to scholars since ancient times. The first and
most influential theory of persuasion was developed by the Greek philosopher
Aristotle. According to Aristotle, in order to make an argument, one has to study
the following categories: the means or sources of persuasion, the language, and the
arrangement or organization of the various parts of the treatment. The means or
sources of persuasion are strategies for making three appeals, those of ethos, pa-
thos, and logos: The first kind depends on the personal character of the speaker,
the second on putting the audience into a certain frame of mind, the third on the
proof, provided by the words of the speech itself (Aristotle, 1984, p. 2155). The
language of the argument had to be carefully crafted; word choice was important,
and the use of appropriate topoi, or themes, and metaphors, or tropes, was encour-
aged. Finally, the arrangement was also important: A speech has two parts. You
must state your case, and you must prove it (Aristotle, 1984, p. 2257). A well-or-
ganized or properly arranged speech had three parts: introduction, argument and
counter argument, and epilogue. What you should do in your introduction is to

state your subjects, in order that the point to be judged may be quite plain; in the
epilogue you should summarize the argument by which your case has been proved
(Aristotle, 1984, p. 2268).
Naturally, the organization introduction, argument, counterargument, and
epilogue are important in making any formal argument. As moves, they could
be expected to be obligatory and appear in a typical sequence. In truly persuasive
discourse, however, the three means of persuasion ethos, pathos, logos are
equally or more important. They can be identified in texts using a top-down, dis-
course-structural analysis.
Let us first examine how Aristotle defined these three categories of persuasion.
Firstly, ethos is used in order to create a positive character of the writer. For Aris-
totle, the character of the speaker is a cause of persuasion when the speech is so
uttered as to make him worthy of belief; for as a rule we trust men of probity more,
and more quickly about things in general, while on points outside the realm of
exact knowledge, where opinion is divided we trust them absolutely (p. 8). That
is, persuasion can be achieved only by those speakers who appear to be positive
characters for the audience. According to Aristotle, the worthiness of the cause
presented by speakers depends upon the worthiness and reliability of the speakers
themselves; thus, the speakers image in the eye of the audience is the crucial point
against which the audience will test the worthiness of the cause. Therefore, it is the
writers responsibility to produce such an image of themselves that they would be
thought of as reliable and unfailing people.
Pathos, another of the basic means of persuasion, is used when the audience is
set into an emotional state by the speaker. Persuasion is effected through the audi-
ence when they are brought by the speech into a state of emotion; for we give very
different decisions under the sway of pain or joy, liking or hatred (Aristotle, 1932, p.
9). Only an emotionally unsettled audience can react in the way the speaker intends
it to react, because emotions serve as an impulse to take a certain action, and very
often the audience will look at the presented case through the prism of their emo-
tions. As Aristotle mentioned, to the audience that is eager and hopeful, the pro-
posed object will seem as a valuable and worthy thing, while to the audience that is
pessimistic and distrustful, the same object will seem the opposite (p. 91).
Lastly, logos is employed when the speaker appeals to the reasonable side of
the audience by utilizing rational arguments. As Aristotle noted, persuasion is
effected by the arguments, when we demonstrate the truth, real or apparent (Ar-
istotle, 1932, p. xlii). In other words, facts, statistics, and information constitute an
essential part of persuasion.
Aristotles rhetorical theory is powerful; it has survived through centuries.
Rhetoricians and philosophers of all centuries appreciate it, highly commend it and
Chapter 5. Rhetorical appeals in fundraising
use it as the solid basis for their own theories. Chaim Perelman was one of the 20th
century rhetoricians who adapted the ancient rhetorical theory to the contempo-
rary discourse of persuasion. What Perelman did in his theory was the enlarge-
ment of understanding of argument by noting numerous subforms of argumenta-
tion (Arnold, 1982, p. 10). Exploring rhetorical theories of ancient philosophers,
Perelman added some fundamental points to his theory of argumentation. For ex-
ample, while describing one of the basic types of the arguments, namely, the Argu-
ment Based on the Structure of Reality (logos, in Aristotelian theory), he developed
a new subtype of argument called Cause and Effect. It is thus the truth of an idea
that can be judged by its effects (Perelman, 1982, p. 83). This kind of argument al-
lows the audience to appraise the cause the speaker proposes through the conse-
quences this cause will have in the future. Another new subtype of argument
developed by Perelman was the argument of Model. According to Perelman, this
type of argument can be employed when the speaker tries to persuade the audience
through presenting a specific case as a model to be imitated. Thus, Perelman, led by
the necessities of contemporary discourse of argumentation, used Aristotelian the-
ory of persuasion to develop his theory of new rhetoric.
Aristotles and Perelmans rhetorics have influenced research in composition
and rhetoric in the U. S. (Corbett, 1965; Kinneavy, 1971). However, scholars in ap-
plied linguistics have not typically sought answers to persuasion in classical rheto-
ric. For example, consider the extensive research on academic persuasion by Hy-
land (1999a; 2001; 2002a; 2002b; 2004a). Although Hyland writes In fact, the
ways that writers establish their credibility (or create an ethos) and consider read-
ers potential attitudes to the argument (pathos) date back to Aristotle (2004a, p.
89), his analyses do not connect with the theories of Aristotle and new rhetori-
cians. Instead, his persuasion categories seem to stem directly from the data.
Persuasive appeals analysis can be applied to any kind of persuasive text, e.g.,
essays, sales letters and reports. Specific appeals strategies may vary across genres,
but the basic appeal types logos, ethos and pathos apply. The appeals analysis,
like the other analyses in this book, provides one hundred percent coverage of the
text. In other words, every sentence and word in the text can be categorized by one
of the appeals.
We believe that Aristotles and Perelmans theories provide an important com-
plementary perspective on the discourse of direct mail letters. While rhetorical
moves segment texts according to the communicative functions of texts, the pri-
mary role of appeals is not necessarily to communicate information for the reader.
Instead, their intent is to make the reader act and, in the case of fundraising ap-
peals, to donate. Even if appeals communicate information and facts about a cause,
they do not do so for informations sake but to make the reader do something
about the situation, in the form of giving money. In fact, even in logos, the writer
has at his/her disposal a variety of different strategies for the best possible persua-
sive outcome. The writer can use facts and statistics, cause-effect description, sto-
ries, etc. The study of appeals reveals how different strategies are used, not only to
communicate but also to affect action. This chapter will apply a rhetorical appeals
analysis to the ICIC sub-corpus of fundraising letters, introduced in Chapter 3.
2 Determining and analyzing rhetorical appeals
In order to evaluate the degree of persuasion in the direct mail letters, a working
system of appeals was needed. Such a system is found in Connor and Lauers (1985)
work on persuasive writing. This system of persuasive strategies was designed and
successfully used for teaching and evaluating college-level students argumentative
essays. It includes 23 persuasive appeals with 14 rational appeals (logos), 4 credibil-
ity appeals (ethos), and 5 affective appeals (pathos). However, for the present analy-
sis of fundraising discourse, only 19 of these appeal types are relevant, summarized
in Table 5.1. Table 5A at the end of the chapter includes examples of each appeal.
Table 5.1 Definitions of rhetorical appeals
Rational appeals (Logos)
R1 Descriptive Example
Using a compelling descriptive example from ones own or someone elses experience
R2 Narrative Example
Using a compelling narrative example. Must contain a beginning, middle, and end of the
story
R3 Classification
Placing in a class or unit, and describing what that means
R4 Comparison
Using comparison to support ones focus
R5 Contrast
Using contrast to support ones focus
R6 Degree
Arguing that two things are separated by a difference of degree rather than kind, or making
an appeal for an incremental change
R7 Authority
Using the authority of a person other than the writer
R8 Cause/effect Means/end Consequences
Showing how one event is the cause of another
R9 Model
Proposing a model for action that relies on existing programs
R10 Stage in process

Reviewing previous steps and looking forward to what steps need to be taken
R11 Ideal or Principle
R12 Information
Using supporting facts and statistics
Credibility appeals (Ethos)
C13 First hand experience

Providing information to show first hand experience or some authority on the subject
C14 Showing writers respect for audiences interests and point of view
C15 Showing writer-audience shared interests and points of view
C16 Showing writers good character and/or judgment
Affective appeals (Pathos)
A17 Appealing to the audiences views (emotional, attitudinal, moral)

A18 Vivid picture
Creating a thought, a minds eye vision
A19 Charged language
Using strong language used to arouse emotions
2.1 Rational appeals (Logos)
Rational arguments are designed to appeal to the sensible and rational aspect of
the readers mind. For example, the second type of argument is Narrative Example
(R2) contains a beginning, middle, and end of a story.
(1) Ted is a single father with three children under 10. Hes never been on welfare
and hes always had a job doing manual laborThere was a time when he felt
like he had no choice but to tolerate his wifes constant abuse and neglect of
their children. Then Ted decided the children deserved a chance to start over
in another town, no matter how difficult it might prove to be.
The author of this description is trying to make the reader see the dreadfulness of
the situation. Moreover, reading this example describing one family, the reader,
according to the logical rule of induction, infers a general conclusion that such an
example is true of a number of families. Such a conclusion, that this appalling fam-
ily situation is true not only of a single family, but of a number of families, makes
the reader willing to react to this appeal.Depicting a family of a certain type, the
author exemplifies all families of this particular type, thus intensifying the effect of
the appeal by implying that the number of unhappy families is actually bigger than
just one. Descriptive Example (R1) is another appeal of this type of argument. (Ta-
ble 5A at the end of the chapter provides again the definitions with an example of
each of the different appeal types.)
Another type of argument found in Aristotles theory is the argument of Clas-
sification (R3). This kind of argument places a person or a thing into a certain class
and then offers defining features. Our example of Classification would be as fol-
lows: In joining SCS, you join the ranks of those who believe that bringing art and
art education to the city makes life better, richer, and more rewarding for the entire
community. By making this rational appeal, the writer classifies and then defines
the reader as a member of a noble group by making him akin to a limited circle of
noble and distinguished individuals.
The next two appeals, Comparison (R4) and Contrast (R5), build a logical ar-
gument on the relationship of like to like. In a fundraising letter, the appeal of
Comparison would sound as follows: Our faculty-student ratio is 1:27. For law
schools in the United States, the range of faculty-student ratios is from 1:13 to 1:35,
but well over half of the law schools in the country have better ratios than we do.
Comparison supports a conclusion about a subject from a description of a related
subject; as in our example, the conclusion about one law school can be made from
descriptions of other law schools. Unlike Comparison, the appeal of Contrast sup-
ports conclusion on a subject by describing its counterpart. For instance:
(2) Unfortunately, our view of the importance of philanthropy is not shared by all
Americans. Many see philanthropy as no more than the grand gestures of the
rich. They do not understand, as you do, that the museums, parks, hospitals
and community organizations supported by philanthropy are the corner-
stones of our very quality of life.
In this example, the writers opinion of the donor is raised by denigrating his/her
counterparts people who do not donate.
Rational appeal of Degree (R6) in Aristotles original theory is called an argu-
ment of More or Less. The rational principle of this argument, according to Aristo-
tle (1932, p. 161), can be expressed by the following example: if the less frequent
thing occurs, then the more frequent thing would occur. In fundraising discourse,
one comes across the appeal of Degree in the form of asking for an increase in
donations. For instance: Please consider an increase in your contribution to the
Girls Scout Annual Campaign. By employing this appeal, the writer implies that
if the donor has already given X amount of money, the next logical step would then
be to increase their donation.
One type of argument based on Perelmans category of person is the appeal of
Authority (R7) (Perelman, 1982). The argument of Authority relies on the consist-
ency between a person and his/her activities. In the argument from the authority,
prestige is the quality that leads others to imitate acts of authoritative people. The
Authority appeal in fundraising discourse would employ a distinguished name to

make the reader act under the influence of someone who is authoritative. For ex-
ample: Pat LaCrosse asked me to send this information inviting you to join the
Georgia OKeefe Circle of the Indianapolis Museum of Arts Second Century Soci-
ety (SCS). The author used the name of Pat LaCrosse without explaining who this
person is. The author assumes that the reader will be acquainted with Pat LaCrosse
and will consider his actions authoritative. The example of the great is a rhetori-
cian of such power that it can persuade people to commit the most infamous acts
(Perelman, 1982, p. 217). An important name brings the flavor of authoritativeness
to the discourse and makes it even more persuasive by presenting a model to be
imitated by the reader.
The appeal of Cause/Effect Means/End Consequences (R8) stems from both
Aristotle and Perelmans (Perelman, 1982, p. 83) theories. According to Aristotle,
Since it commonly happens that a given thing has consequences both good and
bad, you may argue from these [to their antecedents] in urging or dissuading, in
prosecuting or defending, in praising or blaming (Aristotle, 1932, p. 166). This
appeal helps the writer to urge action on the readers part by forecasting effects,
consequences, or ends. Perelman adds, Consequences can be observed or fore-
seen, ascertained or presumed. It is the truth of an idea that can only be judged by
its effects (Perelman, 1982, p. 83). Thus, the writers of direct mail letters often
employ the Cause/Effect Means/End Consequences appeal to let the reader eval-
uate an event through its described outcomes. For instance: As one of only a few
zoos in the country that receives no local, state, or federal tax support, IZS must
depend on donations for general operating funds from corporations like yours
Here, the reader is urged to contribute in order to supply necessary funds to the
organization that receives no local, state, or federal tax support, and as a conse-
quence must depend on donations.
The appeal of Model (R9), as discussed by Perelman, provides the reader with
a description of the way a proposed end can be achieved. A working model reflects
and supports the current case by a precedent. For instance:
(3) A group of your colleagues recently volunteered to help set the priorities for
this campaign. They surveyed members of the staff and faculty councils, ad-
ministrators and others and learned that we at IUPUI have a number of vital
concerns.
Here, the author gives the reader a precedent A group of your colleagues recently
volunteered to make him/her follow this model and take the same actions.
Stage in Process (R10) is also an important argument in the theory of persua-
sion. According to Perelman, this appeal is used when a gap exists between the
concept accepted by the audience and the proposal the writer is defending. The
gap is closed by showing how the proposed action can be a stage in a process. In-
stead of going from A to D, one offers to lead the interlocutor first to B than to C
and finally to D (Perelman, 1982, p. 87). In other words, when the audience might
think that the distance between the initial and final stage or goal of the process is
impossible to cover, the writer creates one or more middle stages or transitional
goals, which, in audiences opinion would be easier to reach. For instance:
(4) Three years ago, the Heritage Trust set aside land for the restoration of the
Limberlost Swamp, near Geneva in eastern Indiana. Now, wildlife is returning
to the area. Egrets, ducks and geese now gather at waterfowl resting ponds in
large numbers; and native prairie grass has been planted to return natural di-
versity and other wildlife to the area.
Before the author indicates the final step, which in this case would be to return
natural diversityto the area, he reviews what steps have been taken set aside
land for restorationnative prairie grass has been planted in the long process
of achieving the final goal to return natural diversityto the area.
The rational appeal of Ideal or Principle (R11) also helps persuade readers. A
convincing discourse is one whose premises are universalizable, that is acceptable
in principle to all members of the universal audience (Perelman, 1982, p. 18).
While persuading the audience, the writer should show that his/her argument is
based on a universal principle that is accepted by all members of the audience. In
the fundraising letters, an example of this appeal occurs as follows:
(5) The mission of the Indianapolis Zoological Society is to provide recreational
learning experiences for the citizens of Indiana through the exhibition and
presentation of natural environment in a way to foster a sense of discovery,
stewardship, and the need to preserve the Earths plants and animals. In short,
the Society is about connecting animals, plants and people.
In this example, the writer establishes a specific value: providing recreational
learning experiencethrough the exhibition of natural environment under a
universal value connecting animals, plants and people. If all members of the
audience agree on the fact that bringing animals, plants and people together is
valuable, then they would more quickly agree on a more determined value of
learning the environment through the exhibition.
The last rational appeal, Information (R12), also contributes to successful per-
suasion. The speaker must, first of all, be provided with a special selection of
premises (facts) The more facts he has at his command, the more easily he will
make the point (Aristotle, 1932, p. 157). The appeal of Information presents facts
and statistics and gives definiteness to the writers argument. The writers of fund-
raising letters must persuade the audience not by vague generalities, but by provid-
ing the reader with accurate and meaningful numbers. For example:
(6) Through the efforts of about 300 volunteers, nearly $89,900 was raised through
the IUPUI Campus Campaign. Almost 900 of us made new gifts in support of
the things we care about. Together with those who were already donors, there
are over 1,350 staff and faculty supporting the work of IUPUI with their gifts.
The numbers in this paragraph show the reader the definiteness of the writers
point on the one hand, and on the other they demonstrate that the writer is knowl-
edgeable on the subject.
To conclude this section on rational appeals, it can be stated that rational ap-
peals are used to target the logical and rational side of the audiences mind (logos).
These twelve arguments are employed by the writers to demonstrate the truth to
the reader in a persuasive way. As Perelman (1982, p. 13) noted, one of the aims of
persuasive discourse, and, consequently, of fundraising discourse, is to make the
reader admit the truth and to provoke him to take an immediate or eventual ac-
tion. However, one should remember that, apart from logos, persuasion is also ef-
fected through ethos, the character of the writer.
2.2 Credibility appeals (Ethos)
According to Aristotle, the discourse must not only convince through the argu-
ment, it must create a trustworthy image of the speaker.
The character of the speaker is a cause of persuasion when the speech is so uttered as
to make him worthy of belief; for as a rule we trust men of probity more, and more
quickly about things in general, while on points outside the realm of exact knowl-
edge, where opinion is divided we trust them absolutely. (Aristotle, 1932, p. 8)
In fundraising discourse, the writer plays an important role because the goal of
direct mail letters is to elicit a response from the audience in the form of giving
money to a particular non-profit organization. It is almost always the case in direct
mail letters that the organization is represented by the writer. Since the trustwor-
thiness and reliability of the organization can be a crucial factor in the donors
decision whether to give money or not, then it is the writers responsibility to cre-
ate such an image of him/herself and the institution in the letter that he/she would
be thought of as a reliable and unfailing person.
The first credibility appeal is the appeal of First Hand Experience (C13). In
fundraising discourse, it is used as a technique for providing information directly
from the writers experiences, thus, establishing the writers credibility; it gives the
impression that the writer is knowledgeable and versed on the subject he/she is
talking about. An example follows:
(7) Purdue has been a part of my life for as long as I can remember. I was raised
in West Lafayette. As I grew older, I realized more and more that Purdue isnt
just a state institution; it is a public university. Moreover, it is a world-class
university.
This example indicates that the writer is a knowledgeable person, who knows and
cares about Purdue University. Thus, the author of the letter tries to create an im-
pression of him/herself as a individual of intelligence and virtue through the dis-
play of deep respect and gratitude for the place where he/she was educated: Pur-
due isnt just a state institution; it is a public universityit is a world class
university.
The next appeal centers on the Writers Respect for Audiences Interests and
Point of View (C14) and is employed to create the necessary impression of a good
willed writer in the audiences mind. This appeal often takes the form of the writers
appreciation for what the donors have done for the organization. For example:
(8) In looking back at the last decade, we at the Indianapolis Zoological Society
(IZS) wish to express our sincere thanks to all companies who have helped us to
achieve so many successes at the Indianapolis Zoo and White River Gardens.
Since he/she is so appreciative of the noble and virtuous deeds of others, the audi-
ence would consider the writer as a man of good will.
When a writer acknowledges shared values and ideas that are held with the
audience, this reflects the appeal of Showing writer-audience shared interests and
points of view (C15). Using this appeal, the author builds up solidarity with the
audience by making himself a part of it. For example:
(9) Because if you and I truly want to preserve philanthropy as a way of life, we
must make certain that Americans everywhere take philanthropy seriously,
that they talk about it, debate it, challenge it, and ultimately keep it alive as a
cherished tradition.
The last of the credibility appeals is based on the Writers Good Character and/or
Judgment (C16). It implies the same Aristotelian ideas of intelligence, virtue, and
good will, but is focused on the creation of the image of the writer. In the case of
this appeal, the author may take a subjective stance to make a judgment. For exam-
ple: Who helps Randy break a cycle of violence and become a better dad? Who
helps Michael, who has spina bifida, learn to talk, dress himself, and get around
independently? Without you, no one. Such a judgment should work towards con-
tributing to the positive image of the writer in the readers eye. By making positive
comments about the reader, a positive helping character, the writer urges the read-
er to view him/her as a person of good intentions, because it takes good will to
notice and appreciate the good deeds of others.
To conclude the discussion about credibility appeals (ethos), it can be stated
that persuasion cannot be effective without taking into consideration the role of
the writers image. So far, we have talked about the role of rationality and credibil-
ity in the theory of persuasion; however, Aristotle defined a third essential aspect
of persuasion theory, namely, emotional or affective appeals (pathos).
2.3 Affective appeals (Pathos)
Persuasion is effected through the audience when they are brought by the speech
into a state of emotion; for we give very different decisions under the sway of pain
or joy, liking or hatred (Aristotle, 1932, p. 9). Emotions can serve as an impulse to
take a certain action, and very often the audience will look at the presented case
through the prism of their emotions. As Aristotle mentioned, to the audience that
is eager and hopeful, the proposed object will seem as a valuable and worthy thing,
while to the audience that is pessimistic and distrustful the same object will seem
the opposite (p. 91). The following discussion presents the three appeals that are
used in fundraising discourse to target the emotional aspect of audiences mind.
Appeal to the Audiences Views (A17) arouses emotions in the reader by address-
ing his/her attitudinal and moral values. In fundraising discourse, this appeal can
take the form of a direct request to donate, for this or that reason. For instance: Please
make a tax-deductible gift to Community Centers of Indianapolis in 1999, and know
that you are playing an important part in meeting the needs of its community. In this
example, the author makes an emotional appeal to donate followed by a reason for
the donation. The word tax-deductible also appeals to the audiences values, suggest-
ing that the donor also may profit by way of cutting his/her tax.
The next affective appeal, Vivid Picture (A18), is very important to persuasion
theory in the sense that it creates the effect of the presence of a reader in a situation
depicted by the writer. Consequently, the writer, trying to persuade the audience,
needs to bring an object as close to the audience as possible. For example:
(10) Do you remember how wonderful and how proud you felt in 1980 when the
young United States Hockey Team beat the powerful Soviet Team 43, and
then went on to beat Finland 42 for the gold or in 1984 when 16 year old
Mary Lou Retton, needing a 9.95 in her final event to tie for first place in the
all around Gymnastics competition, vaulted her way to the gold by scoring a
perfect 10?
Putting the statement into the form of a question involves the reader and makes him/
her look for the answer and thus, makes him present at the event that took place long
ago. Dwelling on the details creates desired emotions in the reader. Thus, creating a
Vivid Picture is an essential appeal to arouse desired emotions in the reader.
The last appeal in the system, Charged language (A19) is the appeal that usu-
ally arouses emotions of anger and indignation. The language that is used by the
writer to evince those emotions has a negative connotation. As Aristotle (1932,
p. 122) said, the writer should heighten the effect of his description with fitting
attitudes, tones, and dress. The emotions should be appropriate to the subject, and
if the writer wants the audience to experience anger, he needs to be angry in his
language. For instance: When it comes to the misuse and destruction of our natu-
ral areas, reality is not only harsh, it is deadly. Once they are developed or altered,
and their fragile ecosystems are disrupted, we lose them forever. Such words as
misuse, destruction, harsh, deadly, loseforever are charged with nega-
tive emotions. While employing such an angry description, the writer attempts
to make the audience experience relevant emotion. Consequently, being in a rele-
vant emotional condition, the readers might take a relevant action.
3 Analysis, segmentation, and classification
The system of 19 rhetorical appeals was applied to the 245 fundraising letters in
the ICIC corpus. First, a sample of 12 letters was evaluated to ensure a high coef-
ficient of interrater reliability. Three trained researchers worked separately on the
identification of appeals in their copies of the sample. After the negotiation of dif-
ferences in the analysis and finalization of the system, another sample of 50 letters
was analyzed to test the level of agreement among the raters.1 After all the discrep-
ancies were negotiated, the other 183 direct mail letters were put to analysis. Each
occurrence of a particular appeal in the letters was identified, coded, and then,
manually counted. A sample tagged letter is shown in Table 5B. The results of the
analysis are presented in the following section.
1. The total number of appeals identified was 463. The number of appeals with disagreement
was only 38, which resulted in a high reliability coefficient of r =.92. Some of the initial defini-
tions of appeals were further refined in order to better describe fundraising discourse. The fol-
lowing section describes the final system and the theoretical basis for it.
3.1 Results and discussion
The results of the segmentation and classification are shown in Table 5.2 and Table
5.3. The overall number of appeals in the 245 sample letters was 1,829. Table 5.2
shows the breakdown of numbers and percentages of appeals by appeal type (ra-
tional, credibility, and affective) and by non-profit field. The overall percentage of
rational appeals in all letters was 48% percent; the corresponding percentages for
credibility and affective appeals were 25% and 28% percent, respectively. Table 5.2
indicates that the use of rational appeal was quite consistent across all six fields;
however, the high amounts of use in the Health and Human Services and Environ-
ment fields (55% and 47%) were unexpected, since common wisdom in fund-
raising suggests more emotional appeals in these fields. Human services letters are
typically seen as appealing to readers through human sob stories, while environ-
mental fundraisers are often seen as liberal idealists.
Table 5.2 Fundraising letter appeals counts and percentages by non-profit field
Rational Credibility Affective Total

Appeals Appeals Appeals Appeals
Health and Human 320 (55%) 104 (18%) 153 (27%) 577
Services Letters (74)
Environmental Letters (10) 43 (47%) 21 (23%) 27 (30%) 91
Community Development 34 (49%) 17 (24%) 19 (27%) 70
Letters (10)
Education Letters 316 (44%) 214 (30%) 193 (27%) 723
(108)
Arts and Culture Letters 138 (44%) 83 (27%) 91 (29%) 312
(37)
Other Letters (6) 19 (34%) 14 (25%) 23 (41%) 56
All Letters (245) 870 (48%) 453 (25%) 506 (28%) 1829
Concerning the use of credibility appeals, Table 5.2 shows that Education had the
highest percentage of these appeals (30%), and Health and Human Services the low-
est percentage (18%). The high percentage of credibility appeals in Education re-
flects the relationship between the writer representing educational agencies and the
target audience. Most of the letters in the corpus come from Indiana University
schools, such as the School of Dentistry, the School of Law, and the School of Lib-
eral Arts. These letters were addressed to former students of IU and were, for the
most part, authored by faculty personally acquainted with the addressees. In the let-
ters, the writers stress the interpersonal connection with students so that the stu-
dents would find the information in the letter credible and, thus, more appealing.
As indicated in Table 5.2, 41% of the appeals in the letters representing Other
Organizations were affective appeals, which is significantly higher than any other
field. The letters in this category represent mainly religious organizations, which
address the audience through the extensive use of affective appeals.
Table 5.3 shows separate counts and percentages for each specific appeal type.
As Table 5.3 indicates, among rational appeals, R8 (Cause/Effect Means/End
Consequences) and R12 (Information) had the highest percentages: 10.3% and
17.7%, respectively. The credibility appeal that occurred most often was C14
(Showing Writers Respect for Audiences Interests and Point of View. Finally,
among affective appeals, the highest percentage occurred with A17 (Appealing to
the Audiences Views; 21.2%).
In summary, the results here show that all three major types of appeals are
used: rational, credibility, and affective. However, the extent of the use of these ap-
peals in the letters is not equal.Rather, the writers of fundraising letters choose, for
the most part, to persuade the audience through the use of the rational appeal.As
a matter of fact, in some of the non-profit fields, it is used almost twice as much as
credibility and affective appeals combined.
The finding about the prevalence of logos in the letters was a surprising find-
ing. Common wisdom would expect emotion from fundraisers. However, a previ-
ous study (Connor & Upton, 2003) also found that fundraising letters resemble
academic writing according to Bibers multidimensional analysis. They are care-
fully constructed and polished. Informal interviews with fundraisers suggest that
they want to sound factual and be taken seriously. Also, it should be noted that the
narrative and descriptive examples in our rating system count as rational appeals.
Since they are often rather lengthy, too, we might be advised to analyze the data
after removing them. It would also be interesting to see whether variation in the
expected audience would change the type of appeals used. For example, would a
younger audience respond better to emotional appeals than older ones?
As far as the use of individual appeals is concerned, we can conclude that the
most extensively used rational appeals are the ones that provide the audience with
the beneficial results or consequences of a particular philanthropic program (R8),
and those appeals that provide the reader with information about the organization
(R12). The credibility of the organization is most often achieved when the writers
demonstrate appreciation of the donors past actions (C14) and less often by stress-
ing organization-donor shared interests and goals (C15). Among the affective ap-
peals, the emotional appeal to the donors views and attitudes, which in fundrais-
ing texts takes the form of a direct request for donation, stands out as the most
frequently used individual appeal.

Table 5.3 Individual appeals counts and column percentages, by non-profit field
Individual Health and Environment Community Education Arts and Culture Other Total
Appeal Human Development
R1 23 (4.0%) 1 (1.1%) 2 (2.9%) 7 (1.0%) 2 (0.7%) 0 (0.0%) 35 (1.9%)
R2 15 (2.6%) 2 (2.2%) 3 (4.3%) 2 (0.3%) 1 (0.3%) 0 (0.0%) 23 (1.3%)
R3 9 (1.6%) 3 (3.3%) 2 (1.4%) 16 (2.2%) 7 (2.2%) 0 (0.0%) 36 (2.0%)
R4 2 (0.4%) 0 (0.0%) 0 (0.0%) 5 (0.7%) 5 (1.6%) 0 (0.0%) 12 (0.7%)
R5 11 (1.9%) 1 (1.1%) 0 (0.0%) 4 (0.6%) 2 (0.6%) 2 (3.6%) 20 (1.1%)
R6 26 (4.5%) 3 (3.3%) 3 (4.3%) 19 (2.6%) 10 (3.2%) 0 (0.0%) 61 (3.3%)
R7 6 (1.0%) 3 (3.3%) 5 (7.1%) 20 (2.8%) 9 (2.9%) 4 (7.1%) 47 (2.6%)
R8 74 (12.8%) 7 (7.7%) 4 (5.7%) 67 (9.3%) 34 (10.9%) 2 (3.6%) 188 (10.3%)
R9 1 (0.2%) 0 (0.0%) 0 (0.0%) 4 (0.6%) 0 (0.0%) 0 (0.0%) 5 (0.3%)
R10 20 (3.5%) 2 (2.2%) 3 (4.3%) 46 (6.4%) 7 (2.2%) 2 (3.6%) 80 (4.4%)
R11 18 3.1%) 0 (0.0%) 3 (4.3%) 16 (2.2%) 2 (0.6%) 0 (0.0%) 39 (2.1%)
R12 115 (19.9%) 21 (23.0%) 10 (14.3%) 110 (15.2%) 59 (18.9%) 9 (16.1%) 324 (17.7%)
R Total 320 (55.5%) 43 (47.3%) 34 (48.6%) 316 (43.7%) 138 (44.2%) 19 (33.9%) 870 (47.6%)
C13 22 (3.8%) 1 (1.1%) 5 (7.1%) 56 (7.8%) 16 (5.1%) 2 (3.6%) 102 (5.6%)
C14 63 (10.9%) 15 (16.5%) 8 (11.4%) 124 (17.2%) 61 (19.6%) 10 (17.9%) 281 (15.4%)
C15 10 (1.7%) 2 (2.2%) 0 (0.0%) 23 (3.2%) 4 (1.3%) 1 (1.8%) 40 (2.2%)
C16 9 (1.6%) 3 (3.3%) 4 (5.7%) 11 (1.5%) 2 (0.6%) 1 (1.8%) 30 (1.6%)
C Total 104 (18.0%) 21 (23.1%) 17 (24.3%) 214 (29.6%) 83 (26.6%) 14 (25.0%) 453 (24.8%)
A17 107 (18.5%) 18 (19.8%) 13 (18.6%) 161 (22.3%) 73 (23.4%) 15 (26.8%) 387 (21.2%)
A18 35 (6.1%) 7 (7.7%) 4 (5.7%) 30 (4.2%) 18 (5.8%) 6 (10.7%) 100 (5.5%)
A19 11 (1.9%) 2 (2.2%) 2 (2.9%) 2 (0.3%) 0 (0.0%) 2 (3.6%) 19 (1.0%)
A Total 153 (26.3%) 27 (29.7%) 19 (27.1%) 193 (26.7%) 9 (29.2%) 23 (41.0%) 506 (27.7%)
Field Total 577 91 70 723 312 56 1829
4 Linguistic description of appeals
In order to explore ways in which appeals are realized linguistically in this case,
lexically wordlist, keyword, and concordance analyses were performed for each
rhetorical appeal; keyword data for all of the 19 appeals analyzed in this study is
included in Table 5C. However, the data from all appeals provided more informa-
tion than can be adequately discussed in this chapter. Therefore, one frequently
used appeal with compelling keyword data was chosen to be presented as an il-
lustration of how linguistic (lexical) variation could be analyzed: appeal A17, ap-
pealing to audiences views.
Affective appeals represent 27.7% (506/1829) of the total appeal use in the
ICICs fundraising letters (see Table 5.2). These appeals play a vital role in persuad-
ing readers by targeting the audiences emotions (Connor & Gladkov, 2004). As
such, appeal A17, appealing to audiences views, which accounts for 76% (387/506)
of all the affective appeals, attempts to arouse readers emotions by speaking to
their emotional, attitudinal, and moral views. Examples of its use in letters are
shown here:
(11) P.S. Many adults in our community dont enjoy reading the way you and I do.
Wont you help plant the seed of learning in them through a gift to [Health
and Human Services Organization]? (Excerpt taken from letter <102ATL003>
<251> of the ICIC corpus)
(12) If you enjoy reading the stories in the enclosed brochure, there is an excellent
chance that you will enjoy membership in [Arts and Culture Organization].
There has never been a better time to join than right now. [Arts and Culture
Organization] offers more opportunities than ever before to learn about Indi-
ana history in fun and interactive ways. You can play an important role in
preserving Indianas history by being an active supporter of [Arts and Culture
Organizations] mission. Start enjoying the benefits of [Arts and Culture Or-
ganization] membership today by completing and returning the enclosed re-
ply card. Dont forget to choose your free gift when selecting your preferred
membership level. [Arts and Culture Organization] is missing only one im-
portant component...you! We hope you will join us. (Excerpt taken from letter
<601CZL183> <643> of the ICIC corpus)
These samples show how writers use the readers own views to persuade them,
stressing the intrinsic value of their cause or organization and then urging readers
to support it, thus proving the value of their own emotions, attitudes, or morals. In
fundraising discourse, appeal A17 (appealing to audiences views) is often used when
the writer makes a direct request for a donation while providing a specific reason
that explains why this donation should be made (Connor & Gladkov, 2004).
4.1 Wordlists
The Wordsmith program was used to compile wordlists for the entire ICIC corpus
of fundraising letters, a second reference corpus, and each individual appeal (Scott,
2004a). Wordlists, which show the frequency of every word used in a corpus, are
useful tools for identifying potential differences between reference and specialized
corpora that can later be examined in greater detail (Henry & Roseberry, 1996;
Hunston, 2002).
Table 5.4 Word frequency counts for affective appeals and corpora2
A17 ICIC British

appealing to Fundraising Letter National
audiences views Corpus Corpus
THE THE THE

TO TO OF
OF AND AND
YOU OF TO
# # A
AND A IN
YOUR IN #
A FOR THAT
IN YOU IS
WILL YOUR IT
FOR OUR FOR
GIFT THAT WAS
OUR IS I
PLEASE WE ON
THAT ARE WITH
HELP WITH AS
WE WILL BE
BE AS HE
THIS THIS YOU
I I AT
IS HAVE BY
WITH BE ARE
CAN AT THIS
OR SCHOOL HAVE
SUPPORT ON BUT
2. The symbol # indicates the use of numerals 09

In the present case, the word use in appeal A17 (appealing to audiences views) was
compared to the two other affective appeal categories, as well as two reference
corpora: all fundraising letters in the ICIC corpus, and the British National Cor-
pus (BNC).
Table 5.4 shows the 25 most frequent words for appeal A17 and in the com-
parison corpora. A visual comparison of the reference and appeal wordlists reveals
word frequency differences that characterize the content and range of each corpus.
While the BNCs wordlist is comprised entirely of function words such as pro-
nouns, prepositions, conjunctions, and articles, the wordlist for the ICIC letter
corpus shows more specificity. Words such as please appear in the wordlist of
A17, reflecting the appeals purpose of appealing to the audiences views. Pronouns
you and your show high frequencies due to their use in addressing readers.
However, while some differences in word frequency can be observed in these
tables, frequency alone does not indicate that a word is characteristic of the ap-
peal. In other words, wordlists represent only a first step in finding differences
between corpora. To determine which words are intrinsic to the language of the
appeal, wordlists of specific corpora must be compared to a reference corpus
wordlist to perform a keyword analysis.
4.2 Keywords
After wordlists were created, a Wordsmith keyword analysis was completed. A key-
word analysis shows the relative frequency of words usages in a specific group of
texts, in this case an appeal type, compared to the relative frequency of those words
usages in a much larger group of texts, in this case the ICIC letter corpus. Thus,
words that occur with a higher relative frequency in the appeal than they do in the
letter corpus are identified as keywords, in that their use represents the lexical tenor
of the appeal (Hunston, 2002; Scott, 2004b). Negative keywords, which occur less
relatively frequently in the appeal than in the corpus, also show the lexical tenor of
the appeal, through their lack of use. The keyword analysis collected data for fre-
quency, keyness and significance (p value). The frequency shows how many times a
keyword occurs in a specific category of appeals in the corpus. The keyness of a
keyword represents the value of log-likelihood or Chi-square statistics; in other
words, it provides an indicator of a keywords importance as a content descriptor
for the appeal.The significance (p value) represents the probability that this keyness
is accidental.Therefore, the higher the keyness value and the lower the p value, the
more distinctive a word is for a particular appeal.This shows that the keyword is
used more frequently in the selected group of texts than in the general corpus. In
the case of negative keywords, which occur less in a certain appeal than in the letter
corpus, a low negative keyness value and a low p value indicate that the word is less
distinctly used in the appeal than in the general corpus.
Table 5.5 shows the data collected by a Wordsmith keyword analysis of the
A17 appeals compared to the entire ICIC letter corpus, arranged by semantic func-
tion (described below).
Table 5.5 Keywords in Appeal A17 in order of distinctiveness
Semantic Function Key Word Frequency Keyness*

Second person pronouns your 395 186.9
you 440 156.1
Solidarity between reader and join 44 37.3
writer/organization us 90 27.1
membership 45 26.4
Description of readers generosity gift 157 110.1
contribution 82 64.7
donation 38 47.3
check 33 36.9
pledge 33 28.2
Inciting of a response from readers please 154 159.7
consider 58 59.0
help 135 56.9
today 70 56.7
will 185 40.4
make 86 39.0
hope 59 32.7
send 37 30.8
Incentives for donation or ease of giving tax 76 89.3
enclosed 75 54.9
card 50 45.0
envelope 46 43.8
return 40 38.7
deductible 29 28.4
receive 36 26.7
Negative keywords name 6 -36.0
sincerely 3 -41.1
Multiple categories to 632 25.3
* P-values for all keywords are less than.01, indicating that there is a less than 1% danger of
error in the calculations
As Table 5.5 shows, your is the keyword most characteristic of this appeals word
use, with a keyness of 186.9. On the other end of the spectrum, the negative key-
word sincerely is least characteristic of appeal A17, with a keyness of -41.1.
Appeal A17 contains 28 keywords, which can be grouped into seven categories
by their semantic function. That is, these are words from particular semantic do-
mains associated with the communicative purposes of this appeal an appeal to
the audiences emotional, attitudinal, and moral views.
In A17, second person pronouns are used to address the audiences perspec-
tives and to acknowledge their actions: Your contribution has been important If
you enjoy reading the stories in the enclosed brochure, there is an excellent chance
that you will enjoy membership. Besides recognizing readers as individuals, A17
also uses words to stress the writers or organizations solidarity with the audience,
in sentences such as I urge you to join us as we build better ways to help our stu-
dents and Please join us, by sending in your membership-application today!
These keywords indicate that the writer views the reader as a peer, one who would
be a valuable addition to the organization.
As Connor and Gladkov (2004) mention, appeal A17 is often used to directly
request donations. Three different categories of keywords are used in donation
requests: keywords that describe forms of generosity, keywords that incite readers
to donate, and keywords that indicate incentives for donation. The first group of
keywords depicts the forms that readers generosity can take, using many synony-
mous monetary terms. This group of descriptive keywords appears in phrases and
sentences such as: By making a contribution or a pledge... It only takes three
quick, easy steps to obtain a matching gift from your (or your spouses) employer
to enhance your donation and Make your check payable to [Education Organi-
zation]. The second group of keywords contains directives like consider, make,
send, and help along with the use of please, today, hope, and will to urge
readers to respond to the letter with a donation to the organization. Sentences like
I hope you will send in your renewal gift today Please consider sending your
donation in today You can make the difference today with your check and
Please send in your gift today and help us reach thousands of Indiana children
directly appeal for donations from readers, using language that requests a re-
sponse.
The third group of keywords is used in sentences to illustrate the easiness of
donating or offer incentives for donating: Return the completed card with your
check by October 15 to receive an invitation to a special artist dinner on Novem-
ber 8 Your gift, made through [Education Organization], is tax-deductible Just
fill out the enclosed pledge card and send it in the return envelope today. The
keywords return, enclosed, card, and envelope depict how easy making a
donation will be for readers; they can simply fill out and return the paperwork
enclosed in the letter. Additionally, receive, tax and deductible all provide
incentives for donating; donators might receive a gift or tax deductions.
Appeal A17 also has two negative keywords: name and sincerely. In other
appeal types, the keyword name serves as the placeholder at the beginning of the
letter for the heading and greeting (e.g. Dear name,) which will later be personal-
ized for each recipient. On the other hand, sincerely is used as a closing saluta-
tion at the end of the letter. The negative keyness of these two words indicates that
appeal A17 does not appear at the beginning or end of fundraising letters; it is dis-
tinct from the greeting and closing salutations.
The last of A17s keywords in table 5.5, to, is more difficult to account for,
because it can be associated with several semantic and grammatical categories, as
it appears in conjunction with pronouns, words that describe readers generosity,
incentives for donation, and expressions of solidarity. The sentence Please con-
sider a gift to [Arts and Culture Organization] before December 31 to receive full
tax deductibility for this year shows two instances of to within appeal A17. Oth-
er occurrences of the word can be seen in phrases like gift to the, your gift to, a
contribution to, and you to join, which are used throughout the appeal.
5 Appeals and discourse structure of letters
Appeals and moves represent two complementary top-down approaches to dis-

course. As such it is useful to compare the distribution of these two features in the
letter discourse. Therefore, a preliminary analysis compared the distribution in 50
randomly selected letters from the corpus. Table 5.6 presents the tabulation of ap-
peals used by each move type. The table includes each occurrence of a specific
appeal in the identified move types in these 50 letters. As can be seen, in most
cases, each move type consisted of multiple appeals. For example, move occurences
of Move Type 3 included 15 different appeal types.
The results suggest that the occurrence of certain appeals in certain move
types is somewhat predictable. For example, rational appeals tend to be placed in
Move Types 2 and 3, towards the beginning of the letter, with the exception of ap-
peals R8 and R12, which occur also in some of the later moves in the letters. It is
interesting, and not surprising, that appeals C14 and A17 are sprinkled through-
out the letter. The writer shows respect for the audiences interests (expressed often
as a thank you for previous contributions) and appeals to his/her views (expressed
as a request for a donation) throughout the letter.
Table 5.6 Placement of appeals in each move type in 50 randomly selected letters
Move 1 Move 2 Move 3 Move 4 Move 5 Move 6 Move 7
R1 1 9 0 0 0 0 0
R2 0 7 0 0 0 0 0
R3 0 5 3 0 0 0 0
R4 0 1 0 0 0 0 0
R5 0 6 1 0 0 0 0
R6 0 5 7 0 0 0 0
R7 3 6 3 0 0 0 2
R8 0 19 24 2 1 2 0
R9 0 1 1 0 0 0 0
R10 0 15 5 0 0 0 0
R11 0 7 1 0 1 0 0
R12 2 53 23 2 8 0 0
C13 2 18 4 1 1 0 0
C14 0 14 33 1 3 26 2
C15 0 3 4 0 0 0 1
C16 1 4 4 0 0 0 0
A17 2 11 85 23 16 6 1
A18 2 9 1 1 0 0 0
A19 0 0 0 0 0 0 0
Total Appeals 13 193 199 30 30 34 6
The results also point to the benefits of studying discourse structures from multi-
ple perspectives. For example, Chapter 3 (Table 3.10) shows how only two Move
Types (2 and 3) were predominant in the letters, accounting for about 65% of all
the moves in these texts. The present analysis, however, shows that appeals reflect
different functional considerations from moves; these same two move types con-
sist of many different types of appeals (all appeal types except A19). At the same
time, a single appeal type is distributed across multiple move types.
All in all, appeals are the ways in which ideas are expressed to persuade the
reader. Any of the moves that were identified in the letters can in theory be ex-
pressed through any of the three broad types of appeals (rational, emotional, cred-
ibility). These broad types of appeals can be expressed through a choice of indi-
vidual appeals.
6 Conclusion
This chapter has explored the purpose and characteristics of rhetorical appeals in
fundraising letters. The characterization and use of appeals can be dated back to
Aristotle, and has since been extended and enhanced by rhetoricians and philoso-
phers. Our study approached the study of appeals as a top-down discourse analysis
of fundraising letters. After the set of possible appeal types was determined, the
letters were segmented and categorized into appeals. Linguistic analyses were con-
ducted to gain a better understanding of the typical linguistic category of each
appeal.A closer analysis of one appeal type A17 served as an example of the
characteristics of word usage within individual rhetorical appeals from fundrais-
ing letters. Wordlists show how A17s word frequencies diverge from the word
frequencies of other affective appeals, and of reference corpora. By examining the
presence and functions of keywords, the purpose and tenor of the appeal is re-
vealed through its most characteristic words. Through this investigation, appeals
become defined as distinct elements of rhetorical structure.
The analysis of rhetorical appeals provides a complementary perspective to
the study of fundraising letters. Also a top-down approach to discourse, appeals
analysis differs from rhetorical moves analysis in that its function is to reveal the
persuasive roles of text sections, not the communicative or informative ones, as is
the case with moves. It should be noted that the sequencing of appeals in letters is
not as predictable as that of moves. However, preliminary analysis revealed inter-
esting patterns about the placement of the most frequently occurring appeals. Fur-
ther analyses should continue exploring the placement of all the appeals for a more
comprehensive understanding of their discourse organizational tendencies.
Table 5A Definitions and examples of rhetorical appeals
Rational
R1 Descriptive Example
Using a compelling descriptive example from ones own or someone elses experience
Families are being torn apart, and too often, children are the victims. Kids like Tommie
J., made a ward of the court because of repeated beatings by an alcoholic father; Alice, sent to a
group home to get help because of severe behavior disorders; and John H., a recovering alco-
holic, rebuilding a relationship with his family so they can live together again.
R2 Narrative Example
Using of a compelling narrative example. Must contain a beginning, middle, and end of a story
Ted is a single father with three children under 10. Hes never been on welfare and hes always
had a job doing manual laborThere was a time when he felt like he had no choice but to toler-
ate his wifes constant abuse and neglect of their children. Then Ted decided the children de-
served a chance to start over in another town, no matter how difficult it might prove to be.
R3 Classification
Placing in a class or unit, and describing what that means

In joining SCS, you join the ranks of those who believe that bringing art and art education to the
city makes life better, richer, and more rewarding for the entire community.
R4 Comparison
Using comparison to support ones focus

Our faculty-student ratio is 1:27. For law schools in the United States, the range of faculty-stu-
dent ratios is from 1:13 to 1:35, but well over half of the law schools in the country have better
ratios than we do.
R5 Contrast
Using contrast to support ones focus

Unfortunately, our view of the importance of philanthropy is not shared by all Americans. Many
see philanthropy as no more than the grand gestures of the rich. They do not understand, as you
do, that the museums, parks, hospitals and community organizations supported by philanthro-
py are the cornerstones of our very quality of life.
R6 Degree
Arguing that two things are separated by a difference of degree rather than kind, or making an ap-
peal for an incremental change
Please consider an increase in your contribution to the Girls Scout Annual Campaign.
R7 Authority
Using the authority of a person other than the writer

Pat LaCrosse asked me to send this information inviting you to join the Georgia OKeeffe Circle
of the Indianapolis Museum of Arts Second Century Society (SCS).
R8 Cause/effect Means/End Consequences

Showing how one event is the cause of another
As one of only a few zoos in the country that receives no local, state, or federal tax support IZS
must depend on donations for general operating funds from corporations like yours
R9 Model
Proposing a model for action that relies on existing programs

A group of your colleagues recently volunteered to help set the priorities for this campaign. They
surveyed members of the staff and faculty councils, administrators and others and learned that
we at IUPUI have a number of vital concerns.
R10 Stage in process
Reviewing previous steps and looking forward to what steps need to be taken
Three years ago, the Heritage Trust set aside land for the restoration of the Limberlost Swamp,
near Geneva in eastern Indiana. Now, wildlife is returning to the area. Egrets, ducks and geese
now gather at waterfowl resting ponds in large numbers; and native prairie grass has been plant-
ed to return natural diversity and other wildlife to the area.
R11 Ideal or Principle
As our state continues to develop, we must work harder to protect important

natural areas for wildlife and recreation.
R12 Information
Using supporting facts and statistics

Through the efforts of about 300 volunteers, nearly $89,000 was raised through the IUPUI Cam-
pus Campaign. Almost 900 of us made new gifts in support of the things we care about. To-
gether with those who were already donors, there are over 1,350 staff and faculty supporting the
work of IUPUI with their gifts.
Credibility
C13 First hand experience
Providing information to show first hand experience or some authority on the subject
Purdue has been a part of my life for as long as I can remember. I was raised in West Lafayette.
As I grew older, I realized more and more that Purdue isnt just a state institution; it is a public
university. Moreover, it is a world-class university.
C14 Showing writers respect for audiences interests and point of view
In looking back at the last decade, we at the Indianapolis Zoological Society (IZS) wish to ex-
press our sincere thanks to all companies who have helped us to achieve so many successes at the
Indianapolis Zoo and White River Gardens.
C15 Showing writer-audience shared interests and points of view
Because if you and I truly want to preserve philanthropy as a way of life, we must make certain
that Americans everywhere take philanthropy seriously, that they talk about it, debate it, chal-
lenge it, and ultimately keep it alive as a cherished tradition.
C16 Showing writers good character and/or judgment
Who helps Randy break a cycle of violence and become a better dad? Who helps Michael, who
has spina bifida, learn to talk, dress himself, and get around independently?
Without you, no one.
Affective
A17 Appealing to the Audiences views (emotional, attitudinal, moral)
Please, make a tax-deductible gift to Community Centers of Indianapolis in 1999, and know that
<company> is playing an important part in meeting the needs of its community.
A18 Vivid picture
Creating a thought, a minds eye vision.
Do you remember how wonderful and how proud you felt in 1980 when the young United States
Hockey Team beat the powerful Soviet Team 43, and then went on to beat Finland 42 for the
gold or in 1984 when 16 year old Mary Lou Retton, needing a 9.95 in her final event to tie for
first place in the all around Gymnastics competition, vaulted her way to the gold by scoring a
perfect 10?
A19 Charged language
Using strong language used to arouse emotions.

When it comes to the misuse and destruction of our natural areas, reality is not only harsh, it is
deadly. Once they are developed or altered, and their fragile ecosystems are disrupted, we lose
them forever.
Table 5B Sample Fundraising Letter with Appeals Indicated*
Dear Mrs. Name,

<begin R2> Ted is a single father with three children under 10. Hes never been on welfare and
hes always had a job doing manual labor. Life isnt easy for Ted, but hes determined to raise his
children himself and be a good role model for them. It wasnt always like that. There was a time
when he felt like he had no choice but to tolerate his wifes constant abuse and neglect of their
children. Then Ted decided the children deserved a chance to start over in another town, no
matter how difficult it might prove to be. <end R2>
<begin A18> What happens when people are living on the edge -- barely able to survive -- and
an unexpected emergency arises? What happens if they cant pay their heating or electric bill
because of extreme temperatures? What happens if their children get sick? What happens if they
get sick? <end A18>
<begin R11> Starting over can be very hard -- especially for people who dont have family or
financial resources to draw from in an emergency. <end R11>
<begin C14> Thats why Im writing to you today. Through your financial support, youve al-
ready demonstrated that you want to help genuinely needy people begin anew. So, Id like to give
you one more way to make a difference: The [Health and Human Services Organization] Card.
<end C14
<begin R12> Although it cant be used to make purchases or withdraw money from an ATM, the
value of this Card could be immeasurable -- for the person whose life it might change.
Heres how it works: Simply detach the Care Card from the top of this letter and keep it handy.
Then, if you know of or come across someone who needs our help, please give the card to him
or her. It shows our address and phone number in [City], but through our Service Extension
Office, we can put the person in contact with the volunteer representative in [County]. Well
welcome their call and do our best to help them through the difficult time, so they can get on
with their lives. <end R12>
<begin A17> And just as important, I invite you to renew your partnership with [Health and
Human Services Organization] by sending a contribution, once again, today. Your donation of
$25, $50, $125, $250 or more will help us remain a steady source of assistance for people like
Ted. <end A17>
<begin C14> Thank you for your continued financial support, and for remaining alert for neigh-
bors who need a helping hand. God bless you for your compassion and kindness. <end C14>
Blessings,
[Writers Name]
Major
DIVISIONAL COMMANDER
<begin A17> P.S. Your gift today will help us care for struggling families, hungry children, and
lonely and needy senior citizens right here in this community. <end A17>
* This sample is from letter <108AXL026> <322> of the ICIC Fundraising Corpus
Table 5C Keyword data for all appeals*
R1
Key Word Frequency Keyness*
she 19 52.02
job 17 46.28
I 44 34.44
her 16 34.40
was 22 28.30
my 18 27.66
[Health and Human Services 11 26.58
Organization]
his 17 25.89
you 4 44.44
# 14 63.55
R2
Key Word Frequency Keyness
he 58 175.91
his 43 97.01
was 48 86.26
miyares 21 63.63
job 20 45.43
him 17 43.61
her 22 43.15
she 20 42.41
Wanda 11 36.42
blind 10 36.03
had 18 34.94
baby 12 34.70
but 27 31.92
disabled 15 31.09
were 19 29.15
mother 10 26.25
Wandas 7 24.86
will 5 25.32
are 4 30.55
you 10 50.92
# 38 51.89
your 3 59.51
R6
# 197 186.08
or 28 24.88
R7
his 16 28.26
dr 10 24.93
# 14 43.85
R8
provide 47 34.35
you 76 42.35
I 12 62.27
R10
faculty 43 40.05
gift 8 25.55
I 16 41.03
your 28 76.03
you 32 107.08
R12
us 20 24.36
please 10 32.54
name 6 44.63
gift 13 48.63
your 74 84.98
you 89 118.38
I 13 120.28
C13
I 136 214.10
my 64 164.02
am 20 34.11
was 31 31.13
your 12 35.50
C14
you 315 290.51
thank 89 202.96
your 217 165.47
call 32 64.37
support 88 62.37
if 50 48.07
questions 22 44.33
any 32 42.72
have 85 40.48
I 86 34.19
thanks 17 30.33
ext 11 28.34
forward 15 28.14
appreciate 14 27.85
for 156 26.88
look 17 26.11
me 26 25.04
advance 12 24.31
C15
we 32 33.87
# 4 33.03
C16
proud 6 27.91
A17
your 395 186.94
please 154 159.72
you 440 156.06
gift 157 110.11
tax 76 89.30
consider 58 59.02
help 135 56.89
today 70 56.71
enclosed 75 54.94
donation 38 47.31
card 50 44.95
envelope 46 43.78
will 185 40.44
make 86 39.03
return 40 38.68
join 44 37.25
check 33 36.88
hope 59 32.65
send 37 30.84
deductible 29 28.39
pledge 33 28.24
us 90 27.07
receive 36 26.71
membership 45 26.37
to 632 25.27
name 6 36.04
sincerely 3 41.12
A18
its 14 33.88
what 16 31.90
* The reference corpus used in this comparison included all fundraising letter texts. Moves
A19, R3, R4, R5, R9 and R11 did not have keyword results. P-values for all keywords are less
than.01, indicating that there is a less than 1% danger of error in the calculations Keywords
preceded by have negative keyness.
Part 2
Bottom-up analyses of discourse organization

chapter 6
Introduction to the identification and analysis

of vocabulary-based discourse units
WITH Eniko Csomay, James K. Jones, & Casey Keck
As noted in Chapter 1, one major analytical issue for any attempt to combine
corpus-linguistic and discourse-analytic research perspectives is to decide on
a unit of analysis with a well-defined linguistic basis. In early corpora (such as
the Brown Corpus and the LOB Corpus), the unit of analysis was a text file,
containing a segment of a fixed length (e.g., 2,000 words) extracted from a text.
Corpora of this type have been extremely useful for functional investigations
of grammatical features, but they were not suitable for discourse studies. More
recently, corpora have been constructed from complete texts, such as chapters,
research articles, newspaper articles, or even complete books.
However, there is often extensive linguistic variation within a text, associated
with internal shifts in communicative task, purpose, and topic. One of the first
hurdles for a corpus-based investigation of discourse structure is to identify
the units that comprise texts. In some written genres, text-internal discourse
units can be readily identified because they are marked by sections (in academic
articles) or chapter breaks (in textbooks). However, even a discourse unit like
a book chapter is likely to have systematic internal variation, associated with
shifts in topic or purpose. Other kinds of texts, like a newspaper editorial or a
conversation, have no overt markers of internal discourse units. Thus we need
methods to determine the structural units of discourse in each kind of text, and
to identify the boundaries of those units.
The chapters in Part I of this book addressed this methodological problem
by segmenting texts into moves or appeals, following a top-down analytical
approach. In Part II of the book, we explore a complementary bottom-up
approach. As described in Section 3 of Chapter 1 (see especially Table 1.2), the
first step in a bottom-up approach is to automatically segment all texts in the
corpus into well-defined discourse units, based on linguistic criteria. The specific
method that we adopt here relies on analysis of vocabulary patterns within texts,
identifying a discourse unit boundary when the text shifts to a new set of words.
We thus refer to these units as Vocabulary-Based Discourse Units (VBDUs).
Two overall analytical goals govern the bottom-up approach to
discourse organization developed here. First, the approach should provide a

comprehensive linguistic description of discourse units and the flow of discourse
within texts. Second, the approach should describe generalizable patterns of
discourse organization that hold across all texts of the target corpus. As noted
above, the first step required to achieve these goals is to automatically segment
texts into discourse units.
In the following chapters, the construct of the VBDU is used for these
purposes. The present chapter introduces VBDUs and the analytical techniques
required to segment texts into these discourse units. The following two chapters,
then, illustrate the application of this approach to the analysis of discourse
organization in texts from written research articles and spoken university lectures.
1 Conceptual introduction to VBDUs
Conceptually, a Vocabulary-Based Discourse Unit (VBDU) is a block of discourse

defined by its reliance on a particular set of words. The boundary of a VBDU is iden-
tified as the place in a text where the author/speaker switches to a new set of words.
Because the topic of discourse is expressed through vocabulary, VBDUs can usually
be interpreted as topically-coherent units. However, it also often turns out that the
author/speaker shifts communicative purpose from one VBDU to the next.
An easy way to illustrate the correspondence between shifts in vocabulary and
topical discourse units is through consideration of the major sections of a research
article. There are almost always major shifts in vocabulary between the Introduc-
tion and Methods sections of research articles. For example, Text Excerpt 6.1 com-
pares the end of the Introduction to the beginning of the Methods from a bio-
chemistry research article (taken from the Biochemistry Research Corpus, see
Chapter 4).
Text Excerpt 6.1. Introduction and Methods sections from a Bio-chemistry re-
search article. (MBCSep) [Underlined words do not occur in the adjacent dis-
course unit]
INTRODUCTION
[]
[VBDU 4]
Drosophila early embryos undergo a morphologically intermediate mitosis in
which pore complexes disassemble during prophase and prometaphase, leav-
ing behind open holes, whereas nuclear membranes remain largely intact and
the lamina partially disassembles: some lamins delocalize to the cytoplasm,
but a fraction of them remain in place through early-mid anaphase (Ref.). To
begin determining the functions of LEM domain proteins in vivo, we chose
Chapter 6. Introduction to the identification and analysis of vocabulary-based discourse units
the genetically tractable nematode C. elegans. We report here the identifica-

tion and characterization of the LEM domain proteins MAN1 and emerin in
C. elegans and the discovery that the timing of nuclear envelope breakdown
may be unique in C. elegans relative to other studied eukaryotes.
METHODS
[VBDU 5]
To obtain polyclonal antibodies against Ce-MAN1 and Ce-emerin, mice and
rabbits were immunized at 3-week intervals with synthetic peptides conju-
gated to keyhole limpet hemocyanin. Immunizations and serum production
were performed by Covance Research Products. The following keyhole limpet
hemocyanin-conjugated peptides were used: CAVWKWIGNQSQKRW-
COOH, which corresponds to the last 14 residues of Ce-MAN1 plus an N-
terminal Cys residue; and CQLKLVAETNPEDTI-COOH, which corresponds
to the last 14 residues of emerin plus an N-terminal Cys residue. All peptides
were synthesized, purified by reverse-phase HPLC with the use of a C18 ana-
lytical column, and conjugated to keyhole limpet hemocyanin by Boston Bio-
molecules. Rabbit polyclonal antibodies to Ce-lamin were produced against a
bacterially expressed polypeptide consisting of residues D-217 to F-550 of
lamin and were affinity purified. mAb414, which recognizes a subset of nucle-
oporins, was purchased from BAbCO. mAb104, which recognizes conserved
small nuclear ribonucleoproteins (Ref.), was provided by Dr. Geraldine Sey-
doux. Cy3- conjugated goat anti-mouse and goat anti-rabbit antibodies, and
FITC-conjugated goat anti-rabbit antibodies, were purchased from Jackson
Laboratories. mAbs against tubulin were purchased from Sigma Chemical.
The vocabulary used in these two discourse units is dramatically different. There
are only three content words shared by both the introduction and methods sec-
tions: emerin, lamin, and nuclear. These words are marked in italics in the above
extract. All other content words, and many of the function words, are unique to
one or the other of these two sections; these words are underscored in the above
extract. (Function words common to both sections are in plain text.) This extract
illustrates the dramatic way in which vocabulary can shift at the boundary of text-
internal discourse units, marking a shift in topic. In this case, the boundary also
marks a distinctive shift in communicative purpose: the introduction provides
background information and an overview of the study; the methods provide the
details of the actual procedures. Interestingly, these different communicative pur-
poses are sometimes associated with different sets of function words in addition to
content words. For example, the first discourse unit in Text Excerpt 6.1 relies on
the pronoun we and the preposition in, while the second discourse unit uses past
tense was/were and the prepositions with, from, and by.
But VBDUs are not necessarily restricted to the segments of a text that are
overtly marked by paragraphs or section breaks. That is, because the methodology
for identifying VBDU boundaries relies entirely on distinctive vocabulary, these
discourse units sometimes reflect relatively subtle shifts in topic within ortho-
graphically marked sections.
For example, Text Excerpt 6.2 shows the last two VBDUs from the Introduc-
tion of this same research article. (Note that the second of these is repeated from
Text Excerpt 6.1 above.) Because they are both from the Introduction, these two
discourse units share more vocabulary than what we saw for the two VBDUs in
Text Excerpt 6.1 above; shared words are marked in italics. However, most words
in these two units are unique to one or the other VBDU; the unique words are
underscored.
Text Excerpt 6.2. The last two VBDUs from the Introduction of a Bio-chemistry
research article. (MBCSep) [underscored words are unique to a VBDU; italicized
words are used in both VBDUs]
INTRODUCTION
[]
[VBDU 3]
In mammals, the nucleus is completely disassembled during mitosis, a process
known as open mitosis (Ref.). The lamina depolymerizes, and nuclear mem-
branes disperse into the endoplasmic reticulum network during prometaphase
(Ref.). Physical disruption of the nuclear envelope, caused by spindle micro-
tubules during mid-late prophase (Ref.), may also contribute to the release of
intranuclear contents. By metaphase, the vertebrate nuclear envelope is com-
pletely disassembled. The envelope reassembles onto chromosomes during
late anaphase and telophase (Ref.). Lamina-associated polypeptide 2, lamin B
receptor, and lamins have been proposed to help target reforming nuclear
membranes to chromosomes or to mediate nuclear envelope assembly or
growth (Ref.). The open mitosis of higher eukaryotes contrasts with the
closed mitosis of single-celled eukaryotes such as Saccharomyces cerevisiae
(Ref.). During closed mitosis, the nucleus remains intact and chromosomes
are segregated by an intranuclear spindle apparatus.
[VBDU 4]
Drosophila early embryos undergo a morphologically intermediate mitosis in
which pore complexes disassemble during prophase and prometaphase, leav-
ing behind open holes, whereas nuclear membranes remain largely intact and
the lamina partially disassembles: some lamins delocalize to the cytoplasm,
but a fraction of them remain in place through early-mid anaphase (Ref.). To
begin determining the functions of LEM domain proteins in vivo, we chose

the genetically tractable nematode C. elegans. We report here the identifica-
tion and characterization of the LEM domain proteins MAN1 and emerin in
C. elegans and the discovery that the timing of nuclear envelope breakdown
may be unique in C. elegans relative to other studied eukaryotes.
Several of these distinctive words are repeated within a VBDU, but not used in the
adjacent VBDU; these words are bold underscored in Text Excerpt 6.2. For exam-
ple, the words nucleus, spindle, chromosomes, and closed are repeated in VBDU 3
but not used in VBDU 4. In contrast, LEM domain proteins and elegans are used
repeatedly in VBDU 4 but not used at all in VBDU 3. These repeated words that
are unique to a VBDU give a direct indication of the distinctive topic of the dis-
course unit in contrast to adjacent VBDUs.
Another example of this type comes from the first two VBDUs in the Methods
section from the same research article. Text Excerpt 6.3 shows these two VBDUs,
highlighting the words that are unique to one VBDU but used repeatedly within
that VBDU. In this case, the VBDU division corresponds to a major division in the
procedures used in the study: VBDU 5 the first VBDU in the Methods section
describes the process used to obtain polyclonal antibodies, using peptides conju-
gated to keyhole limpet hemocyanin. This VBDU identifies procedures carried out
by other labs and the materials that were purchased from those labs. In contrast,
the ensuing VBDU (VBDU 6) provides a detailed description of the procedures
used for immunostaining. These procedures include the preparation of slides, us-
ing PBST and PBS to wash, incubate, and/or dilute the preparations, for specified
periods of time (hours or minutes). Although these two VBDUs both come from
the Methods section, they describe different steps in the procedure and different
kinds of methodologies. Here we see how the shift in vocabulary associated with
VBDUs corresponds to the textual shift in topic and purpose.
Text Excerpt 6.3. The first two VBDUs from the Methods of a Bio-chemistry re-
search article. (MBCSep) [bold underscored words are unique to a VBDU but
used repeatedly within that VBDU]
METHODS
[VBDU 5]
To obtain polyclonal antibodies against Ce-MAN1 and Ce-emerin, mice and
rabbits were immunized at 3-week intervals with synthetic peptides conju-
gated to keyhole limpet hemocyanin. Immunizations and serum production
were performed by Covance Research Products. The following keyhole lim-
pet hemocyanin-conjugated peptides were used: CAVWKWIGNQSQKRW-
COOH, which corresponds to the last 14 residues of Ce-MAN1 plus an N-
terminal Cys residue; and CQLKLVAETNPEDTI-COOH, which corresponds

to the last 14 residues of emerin plus an N-terminal Cys residue. All peptides
were synthesized, purified by reverse-phase HPLC with the use of a C18 ana-
lytical column, and conjugated to keyhole limpet hemocyanin by Boston Bi-
omolecules. Rabbit polyclonal antibodies to Ce-lamin were produced against
a bacterially expressed polypeptide consisting of residues D-217 to F-550 of
lamin and were affinity purified. mAb414, which recognizes a subset of nu-
cleoporins, was purchased from BAbCO. mAb104, which recognizes con-
served small nuclear ribonucleoproteins (Ref.), was provided by Dr. Geral-
dine Seydoux. Cy3- conjugated goat anti-mouse and goat anti-rabbit
antibodies, and FITC-conjugated goat anti-rabbit antibodies, were purchased
from Jackson Laboratories. mAbs against tubulin were purchased from Sigma
Chemical.
[VBDU 6]
Immunostaining was performed essentially as described (Ref.). Mixed-stage
animals or isolated wild-type adult C. elegans were placed on polylysine-
treated slides, and 60-mm coverslips were placed above the nematodes. The
slides were placed in liquid N2 or dry ice, and the coverslips were immedi-
ately removed. The nematodes were fixed for 4 minute at 20C in methanol
and then incubated for 30 minute at 2224C in PBST containing 3.7% for-
maldehyde. Nematodes were then washed once in PBST, incubated for 10
minute at room temperature in PBST containing 5% nonfat dry milk, washed
once again with PBST, and incubated overnight at 4C with the primary anti-
body diluted in PBST. Excess primary antibody was removed by washes in
PBST: once for 1 minute, once for 10 minute, and twice for 30 minute each.
The nematodes were then incubated for 2 hour at 22C with the Cy3-conju-
gated goat anti-rabbit antibodies or Cy3- conjugated goat anti-mouse antibod-
ies diluted in PBST. Double-label immunostaining for small nuclear ribonu-
cleoproteins and Ce-lamin was performed as follows. Animals were first
stained with antibodies to Ce-lamin, followed by FITC-conjugated anti-mouse
secondary antibody, and then washed in PBST; the animals were then incu-
bated for 2 hour at 22C with mAb104 or anti-tubulin antibodies, rewashed as
described above, and incubated for 2 hour with Cy3- conjugated anti-mouse
antibodies. For both double- and single-label immunostaining, excess sec-
ondary antibody was then removed by washes in PBST: once for 1 minute,
once for 10 minute, and twice for 30 minute each. Nematodes were then in-
cubated for 10 minute in PBS containing 1 g/ml Hoechst 33258, washed
once with PBS, and mounted in glycerol containing 2% n-propyl gallate.
2 Automatic identification of VBDUs in texts
The computational methods used to automatically identify Vocabulary-Based Dis-

course Units are based on Hearsts (1994; 1997) TextTiling procedure. Conceptu-
ally, this is a quantitative procedure that compares the words used in adjacent seg-
ments of a text. If the two segments use the same vocabulary to a large extent, they
are analyzed as belonging to a single discourse unit. However, when the two seg-
ments are maximally different in their vocabulary, they are analyzed as two dis-
tinct VBDUs.
The TextTiling program processes texts through two 50-word windows. The
windows move through the text one word at a time, and at each point, the program
compares the 50 words in the first window with the words in the second window.
For example, the program begins by comparing the vocabulary in words 150
from a text to the vocabulary in words 51100. The windows then advance one
word, comparing the words 251 to words 52101. These comparisons continue
until the entire text is processed.
Each comparison is represented by a similarity score that measures the extent
to which the vocabulary in the two 50-word windows is the same or different. The
TextTiling similarity score is calculated using the following formula:
totaltypes
(freq1(wordi)*freq2(wordi))
i1
similarity
totaltypes totaltypes
(freq1(wordi))2 (freq2(wordi))2
i1 i1
For each different word (i.e., each word type) in the two windows, the comparison
procedure first multiplies the frequency of that word-type in the first 50-word seg-
ment (freq1) times the frequency of the same word-type in the second 50-word
segment (freq2). (If a word type occurs in only one of the 50-word segments, then
the product of these frequencies is 0.) The multiplied frequencies are then summed
up, creating the numerator of the equation. In the denominator, the frequency of
each word type in each 50-word segment is squared, and those squared frequen-
cies are then summed up (for each segment); the two summations (for each seg-
ment) are then multiplied, and we then compute the square root of the resulting
product. This formula produces values between 0 and 1, where values close to 1
indicate that the two windows have many words in common, and values close to
zero indicate that the two windows have few words in common.
Figure 6.1 shows a plot of the TextTiling similarity scores for the Introduction
and Methods sections from the biochemistry research article discussed above.
Peaks on the graph represent points where the two adjacent 50-word segments are
maximally similar in their vocabulary, indicating that the two segments belong to
the same discourse unit. Valleys represent the point where the two adjacent text
segments are maximally different in their vocabulary.
Peaks and valleys can be identified automatically by computing slope measures.
Any valley that differs by at least 25% from the preceding peak is marked as a VBDU
boundary. The exact location of a VBDU boundary is often adjusted slightly to cor-
respond to written sentence or spoken turn boundaries. Figure 6.1 identifies five
valleys that represent VBDU boundaries; the peaks between these boundaries cor-
respond to VBDUs 3, 4, 5, and 6 presented in Text Excerpts 6.1 6.3 above.
There are several parameters that can be manipulated for the automatic identi-
fication of VBDUs, including the size of the text windows (50 words in the present
studies), the required difference between peak and valley (25% in the present stud-
ies), and the maximum score allowed for a VBDU boundary (i.e., for a valley, which
must be less than 0.2 in the present case). Manipulating these parameters would
potentially result in fewer or more VBDUs being identified in a text, although the
boundaries of those VBDUs should remain relatively constant. The values for these
parameters used here were arrived at through a process of trial and error, by consid-
ering the automatic boundaries assigned to TextTiling profiles of texts from differ-
ent genres, together with the interpretability of the resulting VBDUs.
Figure 6.1 Profile of TextTiling Scores in a Biochemistry Research Article (MBCSEP),

showing the boundaries of four VBDUs
3 Perceptual correlates of VBDUs
Hearst (1997) evaluates the extent to which human raters agree among themselves
on the location of textual boundaries, and the extent to which the boundaries au-
tomatically assigned by TextTiling agree with human perceptions. In both cases,
acceptable but generally weak levels of agreement were found.
We carried out a further series of experiments on the perceptual salience of
VBDU boundaries assigned automatically by our segmentation tool. Seven raters
identified the locations in texts where they perceived a topical shift or some other
kind of discourse boundary. Raters analyzed 12 text excerpts: four textbooks, four
university classroom lectures, and four conversations. Raters were asked to iden-
tify points in the text where there was a major shift in topic or rhetorical purpose.
The human analyses of all texts followed a single general pattern: there was a
very high level of agreement for the placement of a few boundaries, but much less
agreement for the placement of other boundaries. Figure 6.2 illustrates this pattern
in a passage from a sociology textbook on age. All seven raters agreed on the loca-
tion of two topical boundaries in this text excerpt: after sentence #21 and sentence
#39. Both of these breaks are at paragraph boundaries, but there are no other
grammatical or textual signals of topical units. Rather, raters identified these
boundaries based on the content of the passage. Text Excerpt 6.4 gives many of the
sentences from this textbook passage, showing the overall development:
Text Excerpt 6.4. Sociology Textbook on Aging
The final measure of population aging we will discuss is life expectancy. Life
expectancy refers to the average length of time the members of a population
can expect to live. It is not the same as life span, which refers to a theoretical
biological maximum length of life that could be achieved under ideal condi-
tions. [Paragraph 1 continues]
Life expectancy, then, is the average experience of a population. It is cal-
culated from actual mortality data from a single year [Paragraph 2 contin-
ues]
For a better understanding of life expectancy, Exhibit 3.8 gives a great deal
more detail about life expectancy in the United States; it shows the average
number of years of life remaining for people of different age, sex, and race
categories in the United States in 1990. To use the table, look at the left-hand
column to find a target age, then read across to the race and gender category
that is of interest to you. [Paragraph 3 continues]
As you spend some time calculating life expectancies from this table, you
will notice some interesting sources of variation. Average length of life varies
depending on age, race, and sex.
[Paragraph 4 continues] [SENTENCE 21]: So, the longer you live, the longer
you can expect to live (and you can quote us on that)!
[[Major Perceptual Boundary]]
The race differential in life expectancy is evident in Exhibit 3.8. African
American men of all ages have the lowest life expectancies. African American
women have life expectancies lower than European American women [Par-
agraph 5 continues]
These observations suggest two questions: Why is there a race differential
in life expectancy at all, and why does it diminish and even reverse itself at the
oldest ages? In answer to the first question, most of the race differential in
mortality is explained by[Paragraph 6 continues]
The second question regarding the convergence in the race differential
in life expectancy has received some attention, but no definitive answer. One
suggested explanation is [Paragraph 7 continues] [SENTENCE 39]: An-
other hypothesis for the convergence effect is that because African Americans
who make it to the oldest ages do so in spite of many disadvantages and long
odds, they may be survivors; that is, they may have some complex set of
physiological and social psychological survival advantages.
[[Major Perceptual Boundary]]
A final variation in life expectancy that is readily apparent in Exhibit 3.8
is the gender difference. At every age, for both races, females have higher life
expectancies than do males. [Paragraph 8 continues]
Raters perceived three major topics in this excerpt: 1) a conceptual introduction to

life expectancy (definition and how to measure it); 2) comparing/contrasting the
life expectancies of African Americans and European Americans; and 3) gender
differences in life expectancies. The boundaries between these units are at para-
graph breaks, but there are otherwise no other overt grammatical or textual signals
for the boundaries. Rather, recognition of the boundaries requires actually reading
the preceding and following paragraphs, and recognizing major shifts in the topic.
Perceptual boundaries of this type, associated with major shifts in content, can
also be associated with a shift in vocabulary, and therefore they often correspond
to a VBDU boundary. Figure 6.2 shows that the TextTiling segmentation placed
VBDU boundaries after sentence #21 and sentence #40, at essentially the same
places as the perceptual boundaries.
Figure 6.2 Placement of Topical Unit Boundaries by Raters and TextTiling: Sociology
Textbook
At the same time, all texts also had many other minor shifts in topic or purpose
that were regarded as perceptual boundaries by some raters. Figure 6.2 shows
boundaries of this type after sentences 6, 8, 11, 28, 34, and 52. There was much less
agreement on the location of these boundaries, with only 13 raters agreeing. In
many cases, there is also a VBDU boundary located near these less salient percep-
tual boundaries (e.g., the VBDU boundary after sentence #9).
Surprisingly, the human analyses of texts from both spoken and written genres
followed this same pattern: some boundaries are clear-cut, with most raters agree-
ing, while many other boundaries reflect more subtle shifts in topic, and raters
show less agreement on those. For example, Figure 6.3 plots the perceptual bound-
aries that raters identified in a conversation. The topic in this text shifts abruptly,
typical of many face-to-face conversations, and it is often difficult to identify dis-
crete topical units. Raters perceived a relatively clear topical break after utterance
13, when the topic shifts from a general discussion of what Speaker A has been
doing for the past month to a more specific discussion of Speaker As plans for the
next semester at his university. The TextTiling software located a VBDU boundary
at this shift as well.
Figure 6.3 Placement of Topical Unit Boundaries by Raters and TextTiling: Conversation
In contrast, raters found it much more difficult to agree on the location of other
topic boundaries in this conversation, as the topic shifts abruptly and sometimes
subtly. In rapid succession, the participants note that its been a hard year, talk
about work plans for the next day, admire a drawing, ask about the mail, discuss a
school project, lament the absence of money, talk about food and roommates, etc.
Raters located nine topic boundaries in this conversation, but often with only one
or two of the raters agreeing on the location of the boundary. Not surprisingly,
TextTiling seemed to perform a kind of averaging, placing boundaries in between
these less clear perceptual boundaries.
One important aspect of TextTiling is that it is actually a continuous construct,
directly representing the on-going use of vocabulary in a text, and indirectly rep-
resenting the on-going unfolding of topic. Figure 6.1 (above) illustrates the con-
tinuous vocabulary profile of a text, with numerous valleys that could all be poten-
tially interpreted as shifts in sub-topic. After the TextTiling profile has been
computed for a text, we perform a separate analytical step to segment the graph
into discrete VBDUs, representing discrete topical units (see below).
There are several parameters that can be adjusted in the TextTiling software
that affect the number of discrete VBDUs that are identified in a text. The most
important of these is the difference between the TextTiling peak and valley re-
quired to be considered as a VBDU boundary. For example, Figure 6.4 plots the
boundaries in an anthropology textbook passage. Similar to the pattern found
with other texts, human raters agreed on the location of a few topic boundaries in
this text, and then also identified several other boundaries with lesser agreement.
The VBDU segmentation software was run twice on this text: once requiring a
25% difference between the TextTiling peak and valley to be considered a VBDU
boundary, and a second time requiring only a 20% difference. With the more strict
(25%) requirement, only two VBDU boundaries were identified in the text (after
sentences 23 and 36). In contrast, seven VBDU boundaries were identified with
the requirement of a 20% difference between TextTiling peak and valley (which
included the two boundaries identified by the 25% setting). In this case, the VBDU
boundaries are identical to all major perceptual boundaries identified in this text,
plus two other intermediate boundaries.1
Figure 6.4 Placement of Topical Unit Boundaries by Raters and TextTiling: Anthropol-
ogy Textbook
1. Horn (2005) compares the location of VBDU boundaries to the location of step (and move)
boundaries in a sub-corpus of 11 biochemistry research articles taken from Kanoksilapathams
study (see Chapter 4). Although the two methods of text segmentation agreed in some instanc-
es, in general they reflected different underlying constructs. Steps (and moves) are generally
smaller text segments than VBDUs, and they can be discontinuous in a text (so that parts of a
single move/step can be found throughout a text, composed of text segments that are not neces-
sarily contiguous). As a result, there was generally low agreement between the scope of the two
types of units and the specific location of boundaries.
In sum, the results of our experiments indicate a high level of agreement between
the human and automatic VBDU segmentations in cases where there is a high
level of agreement among human raters. However, in other cases there was little
agreement among human raters, and therefore low agreement with the VBDU
segmentation. We would argue that the automatically assigned boundaries are as
valid as human-assigned boundaries in such cases. VBDU segmentation has the
advantages of being reliable and easily applicable to a large corpus of texts. As
such, it is ideal for the investigation of generalizable discourse patterns from a
corpus perspective.
In the following chapters, texts are segmented into VBDUs as the first major
procedural step in the analysis. The focus of the analysis, though, is on the differ-
ent types of VBDUs, determined by a comprehensive description of lexico-gram-
matical characteristics, and on the generalizable patterns of discourse organization
when texts are considered as sequences of VBDUs from these different types. We
argue that VBDUs are valid units of analysis because they prove to be useful for the
description of discourse patterns in texts. That is, the following analyses show that
there are systematic linguistic differences among VBDU types, and that we can
gain useful insights into discourse organization by considering the sequences of
VBDUs in texts. Thus, the validation of VBDUs can be supported independently
from two sources: they correspond generally to human-identified perceptual dis-
course units in cases where humans are able to agree with a high degree of reliabil-
ity, and they prove to be useful and interpretable units of analysis in their own
right. We certainly would not argue that VBDUs are the single correct way to seg-
ment a text into coherent discourse units, but we hope to demonstrate that this is
a highly productive approach for corpus analysis.
At the same time, we recognize the need for extensive future research on the
textual basis of the TextTiling profile. As noted above, TextTiling is a continuous
construct, reflecting the continuous evolution of topic in a text. Future research
could also explore methods for describing texts as continuous constructs rather
than being composed of a sequence of discrete discourse units. Furthermore, we
need additional research on the mechanics of segmenting texts into VBDUs based
on the TextTiling profile. In particular, we need methods for determining the best
settings for the parameters in the TextTiling software (including the window size,
required peak/valley difference, minimum VBDU length, and whether different
parameters should be used for VBDUs of differing lengths). It seems likely that
different kinds of texts will be best segmented with different TextTiling parameter
settings, but at present we have not developed procedures for these adjustments.
Future research should help to refine the actual VBDU segmentation resulting
from TextTiling.
4 Using VBDUs to analyze the discourse structure of texts
As described in Section 2 above, TextTiling is used to segment all texts in a corpus

into vocabulary-based discourse units. Each of these VBDUs can then be treated
as a unit (or a sub-text) for the purposes of linguistic analysis. This corresponds
to Step 2 in the general bottom-up approach to corpus-based discourse analysis
(see Table 1.2 in Chapter 1).
For the linguistic analysis, each VBDU is automatically tagged to identify a
large number of linguistic features. The current version of the tagger used in the
present studies incorporates the corpus-based research carried out for the Long-
man Grammar of Spoken and Written English (Biber et al., 1999). Using dictionar-
ies, probabilities, structural rules, and contextual features, the tagger identifies a
wide range of grammatical features, including word classes (e.g., nouns, modal
verbs, prepositions), syntactic constructions (e.g., WH relative clauses, condition-
al adverbial clauses, that-complement clauses controlled by nouns), semantic
classes (e.g., activity verbs, likelihood adverbs), and lexico-grammatical classes
(e.g., that-complement clauses controlled by mental verbs, to-complement clauses
controlled by possibility adjectives). Appendix Two lists the full set of features that
are identified by the tagger.
Once individual VBDUs are tagged, it is possible to analyze the discourse
development of a text by tracking the use of linguistic features across the VBDUs
of a text. For example, Figure 6.5 shows the distribution of three linguistic features
passive verbs, possibility modals (can, could, may, might), and communication
verbs (e.g., suggest, report) across the 10 VBDUs of a biology research article.
This plot shows that the different sections of this research article are very different
in their use of these linguistic features. For example, passive verbs are especially
common in the Methods section, while possibility modals are especially common
in the Introduction. However, the VBDU analysis further shows that there are in-
teresting patterns of variation within these sections. For example, communication
verbs are especially common in the last two VBDUs of the Introduction, but con-
siderably less common in the very first VBDU. Possibility modals are especially
common in the second VBDU, but considerably less common in the first and last
VBDU of the Introduction. At the other end of the article, communication verbs
are moderately common in the first VBDU of the Discussion section, but then rare
in the final two VBDUs. Patterns like these can be interpreted as reflections of the
shifts in communicative purpose within the scope of a text. (See the fuller discus-
sion of this research article AGFOENT01 in Chapter 7.)
80
70
Passive verbs
60
50
Possibility modals
Dimension Score
40
Communication
30 verbs
20
10
Intro (VBDU 1-3) | Methods (VBDU 4-5) | Results (VBDU 6-7) | Discussion (VBDU 8-10)
Figure 6.5 Distribution of linguistic features across the VBDUs of a biology research article
5 Going one step further: Identifying generalizable VBDU types
VBDU segmentation of a text, and the linguistic analysis of individual texts, are
only preliminary analytical steps required as the basis for the two ultimate goals of
this approach: to provide a comprehensive linguistic description of discourse
units and the flow of discourse within texts, and to describe generalizable patterns
of discourse organization that hold across all texts of the target corpus. One im-
portant measure of this second goal is to investigate whether those general pat-
terns can be applied to individual texts to reveal new insights into their discourse
structure. To achieve these goals, four analytical steps are required (see also Sec-
tion 2 in Chapter 1, especially Table 1.2):
(1) Identify all Vocabulary-based Discourse Units (VBDUs) in a large corpus rep-
resenting a genre, using TextTiling (Segmentation);
(2) Analyze the linguistic characteristics of each VBDU, using a grammatical tag-
ger and multi-dimensional analysis (Linguistic analysis of each unit);
(3) Identify and interpret the basic VBDU types, using Cluster Analysis (Classi-
fication and Linguistic description of discourse categories)
(4) Analyze the discourse structure of texts as sequences of VBDU types (Text
structure and Discourse organizational tendencies)
In the following chapters, the first major analytical goal is to provide a comprehen-
sive linguistic description of the unfolding discourse within texts. For this pur-
pose, we apply multi-dimensional (MD) analysis, rather than focusing on the dis-
tribution of individual linguistic features. That is, individual features will vary in
use across VBDUs, reflecting the functional associations of each feature in relation
to the communicative goals of each VBDU; Figure 6.5 illustrates such patterns of
variation. However, it is further possible to investigate the developing discourse of
texts more comprehensively, considering a much wider range of linguistic features.
The multi-dimensional analytical approach enables descriptions of that type.
As introduced in Appendix One, MD analysis is a methodological approach
that applies multivariate statistical techniques (especially factor analysis and clus-
ter analysis) to the investigation of genre/register variation in a language. The ap-
proach was originally developed to analyze the range of spoken and written genres
(or registers) in English (Biber, 1986, 1988). There are two major quantitative as-
pects of a MD analysis: (1) identifying the salient linguistic co-occurrence patterns
in a language the dimensions; and (2) comparing texts and genres in the linguis-
tic space defined by those dimensions.
The results of the MD analysis of VBDUs can be applied directly to investigate
the discourse organization of texts (see, e.g., Csomay, 2005b). That is, each VBDU
has a score for each dimension, where each of the dimensions represents a distinct
set of co-occurring linguistic features. The dimension scores capture the extent to
which the co-occurring linguistic features are used in the VBDU. By tracking
changes in the dimension scores across the VBDUs of a text, we are able to track
the linguistic development of discourse in the text.
If we imagine a visual representation that plots the dimension scores of each
VBDU, we would notice that discourse units are scattered throughout this multi-
dimensional linguistic space. At the same time, there would be dense groupings or
clusters of VBDUs, representing linguistic styles that are used in multiple texts. For
example, Figure 6.6 shows the distribution of VBDUs extracted from a corpus of
conversations, plotted in a two-dimensional space resulting from an MD analysis.
This figure shows some distinct clusters of VBDUs referred to as text types; the
VBDUs grouped into each cluster are maximally similar in their multi-dimension-
al profiles (see Biber, 1989, 1995). Figure 6.6 clearly shows how VBDUs of a spe-
cific type in a genre tend to exhibit particular linguistic characteristics, as described
by their multi-dimensional linguistic coordinates. For example, all the VBDUs in
Text Type 1 (on Figure 6.6) have large positive scores on Dimension 1 and large
negative scores on Dimension 2. In contrast, all the VBDUs in Text Type 2 have
moderate positive scores on Dimension 1 and positive scores on Dimension 2.
Figure 6.6 Plot of VBDUs along Dimension 1 vs. Dimension 2
Text types are linguistically well defined; text type distinctions have no necessary re-
lation to genre distinctions. Rather, text types are defined such that the texts within
each type are maximally similar in their linguistic characteristics, regardless of their
situational/genre characteristics. However, because linguistic features have strong
functional associations, text types can be interpreted in functional terms.2
In the methodological approach here, text types are identified quantitatively
using cluster analysis, with the dimensions of variation as predictors. Cluster anal-
ysis groups VBDUs into clusters on the basis of shared multi-dimensional/lin-
guistic characteristics: the VBDUs grouped in a cluster are maximally similar lin-
guistically, while the different clusters are maximally distinguished (see Biber,
1989, 1995).
It is possible to use these text types to achieve the second major analytical goal
of this approach: describing generalizable patterns of discourse organization that
hold across all texts of a corpus. In particular, we can investigate the distribution
of text type sequences across all the VBDUs and texts of a corpus, interpreting the
preferred sequences of text types in functional terms. These discourse patterns can
further be applied to the description of individual texts, to identify texts with typ-
2. Text types and genres (or registers) represent complementary ways to dissect the textual
space of a language. Text types and genres/registers are similar in that both can be described in
linguistic and in situational/functional terms. However, the two constructs differ in their pri-
mary bases: genres/registers are defined in terms of their situational characteristics, while text
types are defined linguistically.
ical versus more specialized discourse organizations. Chapters 78 illustrate the

application of these methods to spoken and written genres.
In previous pilot research of this type, we investigated the vocabulary-based
discourse unit types in a large multi-genre corpus, including university classroom
teaching sessions, textbooks, and academic research articles (Biber, Csomay, Jones,
& Keck, 2004). Because there are striking linguistic differences among these genres,
it was relatively easy to identify different discourse unit types with dramatically dif-
ferent linguistic characteristics. Genre proved to be the most important factor in
that study, with the VBDU types being constrained by genre distinctions. This was
especially the case for the spoken/written opposition represented in the corpus for
that study: The VBDU types used for spoken discourse were for the most part dis-
tinct from the VBDU types used for the written genres in this corpus.
That pilot research suggests that this analytical approach will be more produc-
tive if carried out on a restricted corpus representing only a single genre. That is,
by minimizing the influence of genre and mode on the macro-level, we are more
likely to capture differences associated with the particular communicative pur-
poses that can shift within a text. In this way, the VBDU types can be interpreted
as sub-genres on a micro-level. The following two chapters illustrate analyses of
this type: Chapter 7 focuses on academic research articles while Chapter 8 focuses
on university classroom teaching.
The advantages of this approach are that the underlying units of analysis and
constructs are identified through automated large-scale corpus analysis, and thus
they represent the typical patterns of use found over the scope of the entire corpus.
However, the ultimate application of the analysis takes us back to the individual text,
to investigate the extent to which we can describe the discourse organization of a
particular text in terms of these generalizable constructs derived from the corpus.
The discussion in this section has been relatively abstract, outlining the major
analytical steps followed for the study of VBDU types. It is much easier to under-
stand the application of these analytical procedures through concrete examples,
and these are provided by the case studies in the following two chapters.
chapter 7
Vocabulary-based discourse units

in biology research articles
WITH James K. Jones
Textual analyses of moves were first proposed by Swales (1981) as a way to

understand the internal discourse structure of Introductions in academic
research articles. Since that time, there have been numerous move-based
investigations of research articles (see the survey of studies in Chapter 2),
including the corpus-based study of moves in biochemistry research articles
presented in Chapter 4. These studies have extended the original framework
developed by Swales, considering the moves found in all sections of research
articles (Introduction-Methods-Results-Discussion), and comparing the move
structure of research articles from different academic disciplines.
The present chapter takes a complementary approach, investigating the
discourse organization of research articles from the perspective of Vocabulary-
Based Discourse Units (introduced in Chapter 6). The study here focuses
specifically on empirical research articles in biology. All of these articles already
have their internal discourse structure explicitly marked by four sections:
Introduction Methodology Results Discussion. The research question that
we set for ourselves in this study was whether a multidimensional discourse
analysis could identify other micro-genres that operate within the scope of
these rhetorical sections, thus allowing a more detailed analysis of the internal
discourse organization of these academic research articles. The analysis is
motivated by the same two overall goals that we described in Chapter 6: to
provide a comprehensive linguistic description of discourse units and the flow
of discourse within texts; and to describe generalizable patterns of discourse
organization that hold across all texts of the target corpus.
1 Constructing the corpus of VBDUs
The first step in the analysis was to construct a corpus representing a broad sam-
pling of empirical research articles in biology. We included articles from 10 major
journals from several different subfields:
Agricultural and Forest Entomology
Annals of Human Genetics
Clinical and Experimental Pharmacology and Physiology
Conservation Biology
Functional Ecology
International Journal of Plant Sciences
Journal of Anatomy
Journal of Applied Microbiology
Journal of Avian Biology
Journal of Medical Primatology
Recent issues of these journals that were available on-line (usually from summer
or fall 2004) were chosen, with the first 10 empirical research articles selected from
each journal.(Survey articles and theoretical articles were excluded from the cor-
pus.) All articles included in the corpus had four sections: Introduction Method-
ology Results Discussion (IMRD). At the outset, these sections were treated as
separate texts. The article sections provided a high-level representation of the dis-
course structure, but we were interested in the internal discourse organization
within each section. We thus segmented articles into sections before undertaking
the VBDU analysis. Thus, our corpus initially consisted of 400 texts: 10 academic
journals x 10 articles from each journal x 4 sections in each article.
The next step in the analysis was to segment these texts into Vocabulary-Based
Discourse Units (VBDUs). We applied the TextTiling procedure to automatically
identify VBDU boundaries (described in Chapter 6).
The following text sample from the introduction of a research article illustrates
the kind of discourse units identified by the TextTiling tool, showing how VBDU
boundaries typically correspond to a shift in topic and/or purpose. Each of these
two VBDUs contain many words not found in the adjacent stretch of discourse.
The first VBDU introduces the general context of the study, referring to various
coniferous hosts, Europe, Northern America, and Mediterranean region. The first
VBDU also introduces the eriophyoid mite, which is then discussed further in the
second VBDU. However, the two VBDUs differ in purpose: in the first VBDU, we
learn about the general distribution of the eriophyoid mite; for example, the mites
are associated with fast growing plant tissues. Then, in the second VBDU, the topic/
purpose shifts to a more specific discussion of why seasonal variation occurs in the
Chapter 7. Vocabulary-based discourse units in biology research articles
abundance of these mites. This new topic is associated with many new vocabulary
items not found in the first VBDU, including: partly, explained, typical, dynamics,
attack, colonized, first, swell, enlarge, stop, time, reproducing, etc.
Text Excerpt 7.1. Text from the Introduction of a research article (Agricultural and
Forest Entomology; AGFOENT01I), showing the location of VBDU boundaries.
(The distinctive words in VBDU #2 are shown in bold underlined.)
ARTICLE BEGINS
[VBDU 1]
The eriophyoid mite Trisetacus juniperinus has been frequently recorded on
various coniferous hosts in Europe, as well as in Northern America, and is
responsible for shoot deformation and death of apical cells (Keifer, 1975;
Castagnoli, 1996). In the Mediterranean region, it can cause considerable
damage to the evergreen cypress, Cupressus sempervirens L., especially in
nurseries and young stands (Nuzzaci & Monaco, 1977; Castagnoli & Simoni,
1998; Roques & Battisti, 1999). This cypress is one of the most important tree
species for landscape and forestry in the whole Mediterranean region (Teis-
sier du Cros, 1999 ). The mites appear to be associated with fast growing plant
tissues, such as the apical buds of the shoots and the young reproductive or-
gans (male and female cones) (Guido et al., 1995; Castagnoli & Simoni, 2000
). Active meristemes in the apical buds are available throughout the year, par-
ticularly in nurseries and young stands, whereas cones are produced only
when trees are sexually mature (1015 years ) (Teissier du Cros, 1999). A de-
tailed study of the life history of T. juniperinus on young cypress trees showed
that great seasonal variation in abundance might occur in this species, with a
major peak during the spring growth period (Castagnoli & Simoni, 2000).
[VBDU 2]
Such variation was partly explained by the typical dynamics of the mite at-
tack on buds: the colonized buds first swell and enlarge, then stop growing.
At that time, the mites are reproducing within the buds and a high number
of eggs and juveniles can be found inside (fig. 1a). Subsequently, the mites
leave to disperse in the crown, whereas the deformed buds can resume
growth to some extent (fig. 1b). New attacks can be detected within the same
year but, usually, they are less severe. This behaviour of the mite and the re-
action of the tree can be considered as something in between the formation
of a true gall and the defence reaction of the tissues, both phenomena having
been described for eriophyoid mites (Westphal & Manson, 1996).
Using the TextTiling techniques, we segmented all texts in our corpus into VB-
DUs. Table 7.1 below shows the composition of the original corpus and the number
of VBDUs identified in each research article section.
Table 7.1 Corpus used for the analysis: Breakdown by section
# of texts # of words total VBDUs # of VBDUs

100 words
Introduction 100 61,000 292 238

Methods 100 109,000 526 426
Results 100 122,000 469 381
Discussion 100 94,000 540 458
Total 400 386,000 1,827 1,503
Table 7.2 presents descriptive statistics for the VBDUs that were extracted from
each of the four different article sections. VBDUs are on average around 240 words
long in each section, with the longest VBDUs being around 1,000 words. We ex-
cluded all VBDUs shorter than 100 words from the quantitative analyses because
the quantitative distribution of linguistic features cannot be reliably measured in
short texts. Thus, the shortest VBDUs in Table 7.2 are 100 words.
Table 7.2 Descriptive statistics for VBDU length in each register
VBDU Length (words)
N of VBDUs Mean Std Dev Min. Max.
Total research articles 1,503 243 118 100 906
Breakdown by section
Introduction 238 242 115 100 770

Methods 426 240 120 100 835
Results 381 232 105 100 602
Discussion 458 254 127 100 906
2 Analyzing the linguistic characteristics of VBDUs:

Multi-dimensional analysis
To achieve the first major analytical goal providing a comprehensive linguistic

description of research article discourse units we carried out a multi-dimension-
al (MD) analysis of this corpus. After the biology corpus was segmented, each
discourse unit was automatically tagged for a large number of linguistic features
using the Biber grammatical tagger (see Chapter 6, Section 4). Then multi-dimen-
sional analysis was used to identify the major patterns of linguistic variation among
discourse units. Tables 7A, 7B and 7C at the end of the chapter give the full facto-
rial structure for the analysis in the present study, while Table 7.3 summarizes the
important linguistic features defining each dimension.1
As introduced in earlier chapters, MD analysis requires two major quantitative
steps: (1) identifying the salient linguistic co-occurrence patterns, using factor
analysis; and (2) comparing texts and genres/registers in the linguistic space de-
fined by those co-occurrence patterns: the dimensions. Appendix One provides a
fuller conceptual and methodological introduction to multi-dimensional analysis.
Only 38 of the original 120+ linguistic features were retained in the factor
analysis for the present study. Features were discarded either because they did not
occur frequently enough to be considered important, or because they overlapped
to a large extent with other features. For example, the counts for common verbs,
nouns, and adjectives overlapped extensively with the semantic categories for
those word classes, even though the counts were derived independently. In other
cases, features were dropped because they were extremely rare in biology research
articles (e.g., 2nd person pronouns, phrasal verbs). Some of these features were
combined into a more general class. For example, to-clauses were originally bro-
ken down into five lexico-grammatical features, depending on the semantic class
of the controlling verbs: communication verbs, mental verbs, verbs of desire, verbs
of causation, and epistemic verbs. However, these lexico-grammatical features did
not occur frequently enough in this corpus, and so they were all combined into a
single feature: verb + to-clause. Similarly, all passive constructions were combined
into a single feature (including agentless, by-passives, and non-finite passive claus-
es). In this case, the individual features were all relatively common, but they did
not vary sufficiently across the texts of this corpus to figure prominently in the fi-
nal factor analysis.
1. Principal components analysis, with a promax rotation, was used for the analysis. Features
with a communality estimate greater than .1 were retained in the final factor analysis. All featu-
res with loadings greater than |.3| are listed in Table 7.3. In addition, features with loadings sli-
ghtly less than |.3| are listed in parentheses.
Table 7.3 Summary of the four dimensions from the factor analysis of the biology corpus*
Dimension 1: Evaluation of possible explanations

Features with large positive loadings:
predicative adjectives, main verb be, adjective + to-clause, adjective + that-clause, adverbs, pre-
diction modals, possibility modals, linking adverbials, causative adverbial subordination, con-
ditional adverbial subordination, pronoun it, 1st person pronouns, (3rd person pronouns, ne-
cessity modals)
Features with large negative loadings:
nouns
Dimension 2: Current state of knowledge versus past events and actions

communication verbs, communication verb + that-clause, present tense, perfect aspect, epis-
temic verb + that-clause, relative clauses, demonstrative pronouns, (verb + to-clause)
Features with large negative loadings:
past tense, clausal coordination, (concrete nouns)
Dimension 3: Procedural presentation of actions / events vs. elaborated description

passive voice verbs, activity verbs, past tense, mental verbs, progressive aspect, time adverbials,
cognitive nouns
Features with negative loadings:
attributive adjectives
Dimension 4: Abstract / theoretical discussion of concepts

nominalizations, long words, abstract nouns, process nouns, cognitive nouns, attributive adjec-
tives, noun + that-clause
* See Appendix Two for examples and Biber et al. (1999) and Quirk et al. (1985) for further
description of each of the linguistic features.
The solution for four factors was selected as optimal.2 Each factor comprises a set
of linguistic features that tend to co-occur in the discourse units from the biology
corpus. Factors are interpreted as underlying dimensions of variation based on
the assumption that linguistic co-occurrence patterns reflect underlying commu-
nicative functions. That is, particular sets of linguistic features co-occur frequently
in texts because they serve related communicative functions.
2. Taken together, these factors account for only 27% of the shared variance (see Table 7A),
but they are readily interpretable. Solutions with additional factors accounted for relatively little
additional variance, and subsequent factors were represented by few features. The Promax rota-
tion used to conduct this analysis allows for correlated factors, although the inter-factor corre-
lations in the present analysis were all small (see Table 7B).
For example, the positive features on Factor 1 (e.g., predicative adjectives,

main verb be, adjective + to-clause, adjective + that-clause, prediction modals,
possibility modals, linking adverbials, causative adverbial subordination, condi-
tional adverbial subordination) co-occur in academic discourse that presents logi-
cal possibilities and compares the merits of competing explanations. Text Excerpt
7.2 a VBDU from a Discussion section illustrates the dense use of positive
Dimension 1 features:
Text Excerpt 7.2. From a Discussion section (Functional Ecology; FUNCECO04D)
(Selected positive Dimension 1 features are shown in bold underlined; Dimension
1 score = 14.4)
It is conceivable that recently created grasslands will maintain a weedy, rud-
eral character on a long-term basis and not develop into the stable communi-
ties typical of old grasslands. The opening of gaps in the sward after drought
may potentially make these communities more vulnerable to colonization by
invasive species more suited to the changed climatic conditions. However, if
the model predictions for wetter winters as well as drier summers prove cor-
rect, the impact on species composition and the dangers from invasive species
will be smaller than they would be otherwise. The strategy of conversion of
arable land to grassland is therefore still desirable on conservation grounds.
However, weather as wet as that in the period 19982001 is likely to remain
relatively rare, and it may be advisable to adjust seed mixes, where sown, to
include more deep-rooting perennial forb species which will persist through
periods of drought.
Reflecting the functions of these co-occurring features in discourse units, the inter-
pretive label Evaluation of possible explanations can be proposed for Dimension 1.
Dimension 2 has features with positive loadings as well as features with nega-
tive loadings, representing two distinct co-occurrence sets. These two feature sets
comprise a single dimension because they tend to occur in complementary distri-
bution: when a discourse unit has a high frequency of the positive set of features,
that same discourse unit will tend to have low frequencies of the negative set of
features, and vice versa.
The positive features on Dimension 2 include communication verbs (espe-
cially controlling a that-clause), present tense, perfect aspect, and epistemic verb +
that-clauses. Those features occur in complementary distribution to the features
with negative loadings: past tense, clausal coordination, and concrete nouns.
On first consideration, this distribution of features is surprising, representing
exactly the opposite pattern from that found in the Biber (1988) multi-dimension-
al analysis of general spoken and written registers. That is, Dimension 2 in the
1988 study was defined by past tense verbs co-occurring with perfect aspect verbs,
in complementary distribution to present tense verbs. That dimension was inter-
preted as representing stereotypical narrative discourse, especially fictional narra-
tive. In contrast, Dimension 2 in the present analysis shows that present tense
tends to co-occur with perfect aspect in biology academic discourse units, and
that both of those features have a complementary distribution to past tense. In this
case, perfect aspect verbs function to report past findings that continue to have
current validity; thus they co-occur with present tense verbs that often report
timeless facts and the current state of knowledge. An example of this is shown in
Text Excerpt 7.3:
Text Excerpt 7.3. From an Introduction section (Agricultural and Forest Entomol-
ogy; AGFOENT05I) (Present tense and perfect aspect verbs are shown in bold
underlined; Dimension 2 score = 0.5)
After hatching, larval clutches develop through five instars before pupation
(Elliott & Bashford, 1978 ). Gregarious newly emerged neonates initially feed
in the vicinity of their eggshells, avoiding eucalypt oil glands by skeletonizing
the leaf surface. However, from the third instars onwards, larvae feed on all
leaf material.Upon cessation of the larval stage, larvae leave the host plant
and enter a prepupal stage in the soil, after which pupation occurs in cocoons
formed from silk, bodily fluids and surrounding soil particles (Elliott & Bash-
ford, 1978; Mcquillan, 1985 ). A suite of natural enemies has been recorded to
attack the immature stages of M. privata. Predators of larvae include spiders
and an unidentified mirid (Hemiptera: Miridae) (Lukacs, 1999). [] The lar-
vae of M. privata are oligophagous, having been recorded feeding in the field
on at least 27 species of eucalypt (Neumann & Collett, 1997). [] Variations
in the level of E. globulus defoliation following outbreaks of M. privata have
been recorded in E. globulus genetics trials and plantations (Farrow et al.,
1994; Jones et al., 2002).
At the other extreme, the negative features on Dimension 2 past tense verbs and
clausal coordination are used mostly for a simple reporting of past actions and
events, as in Text Excerpt 7.4:
Text Excerpt 7.4. From a Methods section (Agricultural and Forest Entomology;
AGFOENT05M) (Past tense verbs are shown in bold underlined; Dimension 2
score = -7.5)
Before being searched each tree was divided into thirds, which represented a
north eastern, southern and north-western aspect. The juvenile foliage of
each aspect was simultaneously searched by one of three scorers for the
number of egg batches present, with scorers alternating which third of the tree
was searched. Searches were conducted for 2 min, or less if the entire tree was
searched within the 2 min. The leaves with egg batches were removed and
returned to the laboratory, where the number of batches per tree was record-
ed, as well as the size of each egg batch. The number of eggs within a batch
parasitized by Telenomus sp. (see Telenomus sp. egg parasitism, Woolnorth)
was also recorded. Prior to analysis a log transformation was used to normal-
ize the count data.
Considering the functions of these complementary sets of co-occurring features,
we can propose the interpretive label Current state of knowledge versus past events
and actions for Dimension 2.
For the most part, Dimensions 3 and 4 have only positive features. Dimension
3, which is interpreted as Procedural presentation of actions/events (versus elabo-
rated description), consists of passive voice verbs, past tense verbs, progressive as-
pect verbs, time adverbials, activity verbs, and mental verbs and nouns. Text Ex-
cerpt 7.4 above illustrates many of these features co-occurring in a typical
procedural discussion from a methods section. Nearly all verb phrases in this dis-
course unit are in the passive voice and incorporate a past tense activity verb (e.g.,
was divided, was searched, were conducted, were removed, was recorded, was used).
This sample also illustrates the use of (non-finite) progressive verbs (e.g., being
searched, alternating) and time adverbials (before, simultaneously, for 2 min., prior
to). Only one negative feature occurs on this dimension: attributive adjectives. VB-
DUs with a dense use of attributive adjectives (coupled with the absence of activity
verbs, mental verbs, etc.) tend to have a descriptive focus.
Finally, Dimension 4 includes mostly nominal features: nominalizations, long
words, abstract nouns, process nouns, cognitive nouns, noun + that complement
clause, and attributive adjectives. At the same time, many noun classes do not co-
occur with these features, such as concrete nouns, animate nouns, or place nouns
(the last two were dropped from the factor analysis because they did not co-vary
significantly with other features in this corpus). The nouns that do co-occur on
Dimension 4 refer to abstract concepts and processes, prompting the interpretive
label Abstract discussion of concepts. Text Excerpt 7.5 illustrates these characteris-
tics in a discussion section VBDU:
Text Excerpt 7.5. From a Discussion section (Conservation Biology; CONSBI-
O05D) (Selected Dimension 4 features nominalizations, long words, and ab-
stract/process nouns are shown in bold underlined; Dimension 4 score = 11.8)
Agricultural intensification had a profound impact on nocturnal and crep-
uscular aerial insect abundance, and certain insect families, many of which
are host-specific, were less common on conventional farms than on organic

farms. In particular, insect families important in bat diets were adversely af-
fected by agricultural intensification. Changes in land use through agricul-
tural intensification have reduced resource abundance for bats and have re-
duced the stability and predictability of such food resources. Because bat
communities are resource-limited (Bonaccorso 1979; Findley 1993), our data
support the hypothesis that agricultural intensification has been a factor in
the reduction in the numbers of key dietary components for bats and that this
reduction has led to reduced bat activity on conventional farms. Significant
correlations between the activity of bats and the abundance of their prey sup-
port assumptions in the United Kingdoms biodiversity action plans that ag-
ricultural intensification has been a significant factor leading to declines in
bat populations. Furthermore, our data suggest that managing farms to max-
imize insect abundance, especially that of key insect families, by maintaining
diverse and structurally varied habitats and reducing agrochemical use,
would benefit bat populations.
3 Comparing the multi-dimensional characteristics of research article sections
The corpus for this study was designed to represent four registers: the Introduc-
tion (I), Methods (M), Results (R), and Discussion (D) sections of biology research
articles. Based on this design, it is possible to contrast the multi-dimensional char-
acteristics of I-M-R-D sections from a register perspective (see also Finegan &
Biber, 2001).
We can use the dimensions to compare VBDUs from different sections by com-
puting a dimension score for each discourse unit. Dimension scores (or factor
scores) are computed by summing the individual scores of the co-occurring linguis-
tic features on a dimension (see Biber, 1988, pp. 9397). For example, the Dimen-
sion 1 score for each VBDU is computed by adding together the frequencies of
predicative adjectives, main verb be, adjective+to-clause, adjective+that-clause, etc.
the features with positive loadings on Factor 1 (from Table 3) and then subtract-
ing the frequencies of nouns the only feature with a negative loading.
The individual feature counts are first standardized so that each feature has a
comparable scale, with a mean of 0.0 and a standard deviation of 1. This process
converts the feature scores to scales representing standard deviation units, so that
all features on a factor have equivalent weights in the computation of dimension
scores (see Biber, 1988, pp. 9397). (The standardization is based on the overall
means and standard deviations for each feature in the biology academic corpus.)
Then, dimension scores are computed by summing the standardized frequencies

for the features comprising each of the four dimensions.
Figure 7.1 plots the mean dimension scores for each of the four article sec-
tions, showing that there are important linguistic differences in their preferred
styles. For example, VBDUs in the Introduction tend to have large positive scores
for Dimension 2 (Current state of knowledge), and large negative scores for Dimen-
sion 3 (representing the absence of procedural features, coupled with a dense use
of attributive adjectives for elaborated description).
Figure 7.1 Overall mean dimension scores for the research article sections
Methods and Discussion sections are the most distinctive in their multi-dimen-
sional characterizations, but the two have almost opposite multi-dimensional pro-
files. VBDUs from Methods sections have large negative scores on Dimension 2 (a
focus on past events) and large positive scores on Dimension 3 (Procedural presen-
tation). The negative scores for Dimensions 1 and 4 reflect the absence of evalua-
tive and abstract features in Methods VBDUs. In contrast, the Discussion section
is the only part of these research articles to show a large positive score for Dimen-
sion 1 (Evaluation of possible explanations), and this section also shows the largest
positive scores for Dimension 2 (Current state of knowledge) and Dimension 4
(Abstract/theoretical discussion).
In sum, the multi-dimensional analysis shows that there are important lin-
guistic differences across the major sections of research articles, reflecting the gen-
eral communicative purposes of each section (e.g., theoretical discussion and

evaluation of explanations versus a procedural description of the actual steps in
the analysis). However, as the following section shows, we can use this same gen-
eral approach to investigate discourse patterns at a more detailed level of analysis.
4 The multi-dimensional profile of VBDUs within a research article:

Tracking the movement of discourse
Research article sections provide an overt indication of the authors intended pur-
pose: introducing the topic; describing the methodology; reporting results; dis-
cussing the implications. As the previous section has shown, there are important
linguistic differences associated with these major shifts in purpose.
However, in the present approach, we compute dimension scores for each
VBDU in a research article. As a result, it is possible to track the discourse flow of
a research article by considering the change in dimension scores across all VBDUs.
Such analyses show that the authors purpose can shift in less explicit ways within
a research article section. By tracking the MD profile of VBDUs within article sec-
tions, we are able to provide a more detailed description of the internal discourse
organization of an article.
Figure 7.2 plots the multi-dimensional profile for an article from Agricultural
and Forest Entomology. In all, there are 10 VBDUs in this article: three in the Intro-
duction; two in Methods; two in Results; and three in the Discussion. Figure 7.2
plots only scores for Dimensions 13. (Dimension 4 is less distinctive in this re-
search article, and it is excluded to avoid clutter in the figure.)
Similar to the general patterns described in the previous section (see Figure
7.1), Figure 7.2 shows that there are important overall differences in the dimension
scores across the major sections of this research article. In general, the Introduc-
tion is characterized by reference to current knowledge (vs. past events; Dimen-
sion 2), a moderate use of evaluation features (Dimension 1), and a mixed profile
for procedural description (Dimension 3). The Methods section is quite distinc-
tive, with a marked use of past event features (Dimension 2) and procedural de-
scription features (Dimension 3), together with the absence of evaluation features
(Dimension 1). The Results section in this article maintains the focus on past
events (Dimension 2), but shifts to a moderate use of evaluation features (Dimen-
sion 1). Finally, the Discussion section shows a dramatic shift to current knowl-
edge (Dimension 2) and the use of evaluation features (Dimension 1).
However, these characterizations are not consistent across all VBDUs within a
section. Rather, we can track shifts in the multi-dimensional profile across the
VBDUs within an article section, reflecting internal shifts in communicative pur-
pose. The two most dramatic examples of this type are in the Introduction and
Discussion sections of the article.
Figure 7.2 shows that the first VBDU in the Introduction is not evaluative,
somewhat focused on current (as opposed to past) knowledge, and markedly non-
procedural.The actual text of this VBDU is given above in Text Excerpt 7.1. Most
of this passage is written in the present tense (Dimension 2), with a reliance on
existence and simple occurrence verbs (rather than activity or mental verbs). The
most notable characteristic of this VBDU, in comparison to other VBDUs from
biology research articles, is the dense use of noun phrases modified with descrip-
tive attributive adjectives (e.g., various, coniferous, apical, considerable, young,
most, important, whole, growing, reproductive, male, female, active, detailed, great
seasonal, major).
Figure 7.2 Multi-dimensional profile of a biology research article (AGFOENT01), show-

ing Dimension 13 scores for the 10 VBDUs in the article
In contrast, the second VBDU in this research article shifts to a more evaluative
style (Dimension 1), with a greater focus on current knowledge (Dimension 2)
and procedural presentation (Dimension 3). This VBDU is given as Text Excerpt
7.6 (repeated from Text Excerpt 7.1 above), with selected Dimension 2 and Di-
mension 3 features highlighted. These are mostly verbal features (semantic classes
of verbs, and tense/aspect/voice features).
Text Excerpt 7.6. VBDU #2 from the Introduction section (Agricultural and Forest
Entomology; AGFOENT01I) (Selected Dimension 2 and Dimension 3 features are
shown in bold underlined: communication verbs, activity verbs, mental verbs,
present tense, perfect aspect, progressive aspect, passive voice, time adverbials)
Such variation was partly explained by the typical dynamics of the mite attack
on buds: the colonized buds first swell and enlarge, then stop growing. At
that time, the mites are reproducing within the buds and a high number of
eggs and juveniles can be found inside (fig. 1a). Subsequently, the mites leave
to disperse in the crown, whereas the deformed buds can resume growth to
some extent (fig. 1b). New attacks can be detected within the same year but,
usually, they are less severe. This behaviour of the mite and the reaction of the
tree can be considered as something in between the formation of a true gall
and the defence reaction of the tissues, both phenomena having been de-
scribed for eriophyoid mites (Westphal & Manson, 1996).
The shifts in dimension scores reflect an underlying shift in communicative pur-

pose across these two VBDUs. In the first VBDU of the introduction, the focus is
on a description of the state of affairs, providing elaborating details with attribu-
tive adjectives; for example:
This cypress is one of the most important tree species for landscape and for-
estry in the whole Mediterranean region (Teissier du Cros, 1999 ). The mites
appear to be associated with fast growing plant tissues, such as the apical
buds of the shoots and the young reproductive organs
In contrast, the second VBDU shifts to a description of events, documenting the

process by which mites attack buds; for example:
the colonized buds first swell and enlarge, then stop growing. At that time, the
mites are reproducing within the buds and a high number of eggs and juveniles
can be found inside (fig. 1a). Subsequently, the mites leave to disperse
A similarly dramatic shift can be observed between the last two VBDUs in the
Discussion section. Figure 7.2 shows that all three VBDUs in the Discussion have
positive Dimension 2 scores, focusing on current knowledge (in contrast to the
VBDUs in the Methods and Results sections, which report past events). However,
in the last VBDU we see a notable shift to a highly evaluative style of discourse,
marked by the high positive score on Dimension 1. This shift represents a switch
from an impersonal summary of findings (VBDU 9) to a first-person discussion of
possible implications, alternative factors, and suggestions for future research
(VBDU 10). Text Excerpt 7.7 presents selected sentences from these two VBDUs
to illustrate this difference.
Text Excerpt 7.7. The last two VBDUs from the Discussion section of a research
article (Agricultural and Forest Entomology; AGFOENT01D)
[VBDU 9, with selected Dimension 2 features marked in bold underline]
[] This implies the existence of long-term effects of mite infestation, [] It is
worth noting that female cones attacked by T. juniperinus exhibit a similar reac-
tion. Scales colonized by mites produce more tissue and assume a typical defor-
mation, [] Whereas, in a mature tree, the mites could move between the apical
buds and the cones, they depend on the availability of suitable buds in seedlings
and young trees, which makes survival more difficult. The details of colonization
and re-colonization of trees in the field are not known for T. juniperinus, but
eriophyoid mites are known to spread by wind currents and, occasionally, they
are phoretic on birds and insects (Kifer, 1975; Shvanderov, 1975 ).
[VBDU 10, with selected Dimension 1 (evaluative) features marked in bold
underline: 1st person pronouns, modal verbs, linking adverbials, and causa-
tive subordination]
In the greenhouse, as in our study, infestation of seedlings may occur via
contact with infested plants because seeds are not infested with mites (Bat-
tisti et al., 2000 ). [] Infestation of grafted trees via scions can be another
way of mite spreading; however, this was not the case in our experiment
where mites coming from the rootstocks heavily infested the grafted scions.
Other ways of seedling infestation appear unlikely, because a natural re-infes-
tation of trees maintained under outdoor conditions during the experiment
was not detected [] The higher susceptibility of Bolgheri, confirmed in our
study, may represent an economic problem because Bolgheri is the most im-
portant clone on the market, [] Later colonization in the field may be likely
because trees have a higher number of tips and cones. []
The present section has illustrated how the results of multi-dimensional analysis
can be used for detailed investigation of the discourse development of particular
texts; this is done by tracking the multi-dimensional profile of the VBDUs that
constitute the text. As the following sections show, multi-dimensional analysis can
also be used to identify underlying text types, allowing description of generaliza-
ble discourse patterns across the texts of a corpus.
5 Identifying and interpreting the multi-dimensional text types

of biology research articles
We have noted several times the two major analytical goals of bottom-up ap-
proaches to discourse organization: to provide a comprehensive linguistic descrip-
tion of discourse units and the flow of discourse within texts; and to describe gen-
eralizable patterns of discourse organization that hold across all texts of the target
corpus. The case study presented in Section 4 above showed the application of a
comprehensive linguistic analysis to describe the systematic patterns of linguistic
variation within the scope of a single research article section. That case study il-
lustrates how multi-dimensional analysis, coupled with a segmentation of a text
into VBDUs, can be used to provide a continuous linguistic profile of language use
over the course of a text, corresponding to text-internal shifts in communicative
purpose. That approach enables a linguistically-motivated description of the dis-
course structure of individual texts. However, such descriptions do not produce
generalizable results; it is not feasible to directly compare these continuous lin-
guistic profiles across all texts of a corpus.
Thus the methodological challenge here is to develop analytical constructs
that can be used to compare the discourse structure of multiple texts, and thus to
identify discourse patterns that can be generalized for all texts in a corpus. The
approach adopted here is based on discrete text types, considering general dis-
course patterns realized by systematic combinations of those text types.
In previous multi-dimensional studies (e.g., Biber 1989, 1995), text types are
identified quantitatively using Cluster Analysis, with the dimensions of variation
as predictors. Cluster analysis groups texts into clusters on the basis of shared
multi-dimensional/linguistic characteristics: the texts grouped in a cluster are
maximally similar linguistically, while the different clusters are maximally distin-
guished. For the application here, the four dimensions of variation are used as
linguistic predictors in the cluster analysis, which identifies groups of VBDUs that
are maximally similar in their linguistic characteristics; these groupings are inter-
preted as VBDU text types.
Cluster analysis is an exploratory statistical technique that groups VBDUs sta-
tistically, based on the scores for all four dimensions. The FASTCLUS procedure
from SAS was used for the present analysis. Disjoint clusters were analyzed be-
cause there was no theoretical reason to expect a hierarchical structure. Peaks in
the Cubic Clustering Criterion and the Pseudo-F Statistic (produced by FAST-
CLUS) were used to determine the number of clusters. These measures are heuris-
tic devices that reflect goodness-of-fit: the extent to which the texts within a cluster
are similar, while the clusters are maximally distinguished. In the present case,
these measures had peaks for the 6-cluster solution.
Figure 7.3 shows the distribution of research article VBDUs plotted with re-
spect to Dimension 1 and Dimension 2. This figure shows some distinct clusters of
VBDU text types. For example, the VBDUs in Text Type 1 (on Figure 7.3) have
large positive scores on Dimension 1. The VBDUs in Text Type 5 tend to have large
positive scores on Dimension 2. The other text types are less clearly distinguished
with respect to these two dimensions, but some of them have more distinctive
characterizations with respect to Dimensions 3 and 4.
Figure 7.3 Scores for research article VBDUs along Dimensions 1 and 2, identifying the
text type of each VBDU
Table 7D (at the end of the chapter) provides a descriptive summary of the cluster
analysis results, showing the number of VBDUs grouped into each cluster together
with other statistics on the dispersion of VBDUs within the cluster and the nearest
cluster. The clusters can be interpreted as VBDU Text Types, because each cluster
represents a grouping of VBDUs with similar linguistic profiles. Figure 7.4 pro-
vides a graphic representation of the linguistic differences among the text types,
showing the mean dimension scores for each type. (Table 7E at the end of the
chapter provides the actual mean scores and standard deviations for the dimen-
sion scores of each cluster.)
The clusters differ notably in their distinctiveness: the smaller clusters are
more specialized and more sharply distinguished linguistically. For example, Clus-
ter 1 has only 73 VBDUs (see Table 7D); linguistically, Figure 7.4 shows that the
VBDUs grouped in Cluster 1 have extremely large positive scores on Dimension 1
(Evaluation) and moderately large positive scores on Dimension 2 (Current state

of knowledge). At the other extreme, Cluster 3 is a general MD-Discourse Type:
it is large (538 VBDUs) and relatively unmarked in its dimension scores.
Figure 7.4 Mean dimension scores for the six VBDU text types
Table 7F at the end of the chapter and Figure 7.5 show the distribution of VBDUs
across discourse types (the clusters) and research article sections. The clusters are not
distributed evenly across research article sections. For example, 58 of the 73 VBDUs
in Cluster 1 (or 79.45% of all Cluster 1 VBDUs) occur in Discussion sections. Cluster
2 similarly shows a strong association with a single section: 199 of the 265 VBDUs in
Cluster 2 (or 75.09% of all Cluster 2 VBDUs) occur in Methods sections.
At the same time, Table 7F and Figure 7.5 show that most research article sec-
tions can be composed of discourse units from all six clusters. For example, there
are a total of 458 VBDUs from Discussion sections, broken down as follows:
Cluster 1: 58 (12.66%)
Cluster 2: 8 (1.75%)
Cluster 3: 66 (14.31%)
Cluster 4: 65 (14.19%)
Cluster 5: 140 (30.57%)
Cluster 6: 121 (26.42%)
Two major patterns are apparent from Figure 7.5. First, the article sections differ
greatly in the extent to which they rely on different text types. At one extreme,
Results sections tend to rely on VBDUs from a single text type (Cluster 3, inter-
preted below as Description of events); almost 60% of the VBDUs in Results sec-
tions are from this text type. Methods sections are also highly specialized, relying
primarily on only two text types; about 93% of the VBDUs in Methods sections are
from Clusters 2 and 3. In contrast, Introductions and Discussion sections use a
much wider range of text types. The second major pattern has to do with the par-
ticular text types preferred in each article section. For example, VBDUs from Text
Type 3 are especially prevalent in Methods and Results sections, while VBDUs
from Text Type 2 are prevalent only in the Methods section. VBDUs from Text
Type 1 are less common overall and restricted primarily to the Discussion sections
of these articles.
Figure 7.5 Distribution of text types across article sections
Taken together, Figures 7.45 provide the basis for the interpretation of each
VBDU Text Type. These interpretations are refined by consideration of individual
VBDUs from each type (discussed below). Figure 7.4 shows that the four most
distinctive Text Types are defined by especially large scores on one of the four di-
mensions (these scores are shown in bold on Table 7E). VBDUs from Type 1 have
especially large scores on Dimension 1 (Evaluation of possible explanations) plus
relatively large scores on Dimension 2 (Current state of knowledge); Text Excerpt
7.2 above illustrates a VBDU of this type. VBDUs from Type 2 have especially
large positive scores on Dimension 3 (Procedural discourse), together with large
negative scores on Dimension 2 (Past events); Text Excerpt 7.4 above illustrates a
VBDU of this type. VBDUs from Type 5 have especially large positive scores on
Dimension 2 (Current state of knowledge); Text Excerpt 7.3 above illustrates a
VBDU of this type. And VBDUs from Type 6 have especially large positive scores
on Dimension 4 (Abstract/theoretical discussion of concepts); Text Excerpt 7.5
above illustrates a VBDU of this type.
Reflecting these linguistic characteristics, we propose the following tentative
interpretive labels for the six text types in our study:
Text Type 1: Current evaluation of implications and explanations
Text Type 2: Procedural description of past actions and events
Text Type 3: Report of past events
Text Type 4: Abstract elaborated discussion (not evaluative and not procedural)
Text Type 5: Presentation of the current state of knowledge
Text Type 6: Current abstract/theoretical discussion
As the following sections show, biology research articles tend to use these six text
types in systematic combinations, reflecting different underlying patterns of dis-
course organization.
6 Using VBDU text types to describe the discourse

organizational patterns of biology research articles
The sections above have described how research articles can be segmented into
discourse units; how factor analysis can be used to identify the underlying dimen-
sions of linguistic variation among these VBDUs; how those dimensions can be
used to track the discourse development of an individual research article; and how
cluster analysis can be used to identify the text types that are defined by the multi-
dimensional linguistic space.
It is further possible to use these constructs to describe the discourse organi-
zation of an individual research article. Figure 7.5 above shows that Research Ar-
ticle Sections and Text Types provide two complementary perspectives on the dis-
course structure of biology research articles, with all six text types occurring in
each section. By identifying the text type of each VBDU in a research article, we
can describe the internal discourse organization of article sections.
To illustrate, Table 7.4 presents an outline of the VBDUs that comprise a re-
search article from the journal Agricultural and Forest Entomology. This same arti-
cle is discussed in Section 4 above, and Figure 7.2 (above) plots the multi-dimen-
sional profile of VBDUs from this article. Table 7.4 is similar in tracking the
linguistic characteristics of each VBDU in the article, but in this case, those VB-
DUs are classified according to their text type (allowing direct comparison to the
discourse patterns of other research articles).
We have already presented several text samples from this research article. Text
Sample 1 (above) presents the first two discourse units from the article, and Sam-
ple 6 highlights the distinctive linguistic characteristic of VBDU 2. It turns out that
these two VBDUs also represent different text types (Types 4 and 5). The first
VBDU is Type 4 (Abstract/theoretical discussion), characterized mostly by a large
positive score on Dimension 4: Abstract / theoretical discussion of concepts. As
such, this VBDU uses many long words, nominalizations, and abstract/process
nouns, and attributive adjectives (e.g., eriophyoid, Trisetacus, juniperinus, conifer-
ous, Mediterranean, Cupressus, sempervirens, reproductive, meristemes, variation,
abundance). This reliance on technical/abstract vocabulary is absent in the second
VBDU, which is from Type 5 (Presentation of the current state of knowledge). In-
stead we find frequent reference to concrete, tangible nouns, such as mite, bud,
eggs, juveniles, crown, tree, gall. At the same time, the second VBDU illustrates the
use of communication verbs to report previous findings (explained, described),
together with the use of present tense verbs to present currently accepted facts
that provide the background to the present study (e.g., the colonized buds first
swell and enlarge, then stop growing. At that time, the mites are reproducing within
the buds [] Subsequently, the mites leave to disperse in the crown).
Table 7.4 Outline of the VBDUs in a biology research article (AGFOENT01)
Introduction
VBDU1 Type 4 Abstract / theoretical discussion (not evaluative and not procedural)
VBDU2 Type 5 Presentation of the current state of knowledge
Methods
VBDU4 Type 3 Simple report of past events
VBDU5 Type 2 Procedural description of past actions and events
Results
Discussion
VBDU10 Type 1 Evaluation of implications and explanations
(within the context of current knowledge)
Table 7.4 shows that there is also a text type shift between the last two VBDUs in the
Discussion section of this research article; Text Excerpt 7.7 above highlights some of
the distinctive linguistic characteristics of those two VBDUs. VBDU 9 is from Type
5, interpreted as presenting the current state of knowledge. In VBDU 9, these linguistic
features are used to present generally accepted facts that can be used to interpret the
particular findings of the present study (e.g., this implies; it is worth noting that fe-
male cones exhibit; Scales colonized by mites produce more tissue and assume a
typical deformation; size and seed quality of the cone is lower than that of healthy cones
(Battisti et al., 2000); eriophyoid mites are known to spread by wind currents and, oc-
casionally, they are phoretic on birds and insects (Kifer, 1975; Shvanderov, 1975 )).
The last discourse unit of this article (VBDU 10) shifts to Text Type 1, which
relies on the evaluative features associated with Dimension 1. As Text Excerpt 7.7
shows, this VBDU is marked by a notable shift to evaluative/interpretive features,
including 1st person pronouns (our), modal verbs (e.g., may, can, should), causa-
tive subordination, linking adverbials (e.g., however), main verb be, and the dense
use of predicative adjectives, often controlling a complement clause (e.g., optimal,
unlikely, present; are able to, be sufficient to). In sum, the analysis of this discussion
section in terms of its composite text types helps to identify a shift in author orien-
tation reflecting an underlying shift in purpose.
In sum, the text type description is generally in agreement with the multi-di-
mensional profile described in Section 4 above. The multi-dimensional profile, as
in Figure 7.2, actually provides more detailed information about the internal dis-
course characteristics of an individual research article, because it represents con-
tinuous patterns of use rather than discrete categories. However, the primary ad-
vantage of multi-dimensional profiles that it captures continuous patterns of use
is a liability for any attempt to document generalizable patterns of discourse or-
ganization across research articles. In contrast, the analysis of discourse structure
in terms of discrete categories text types is ideal for such purposes. The follow-
ing sections discuss several of the general discourse patterns that emerge from the
investigation of biology research articles.
7 Starting and ending research article sections
One general research question that can be investigated from a text type perspec-
tive is whether there are preferred ways to begin and end a section in a research
article. In the present study, there are distinctive patterns of discourse in Introduc-
tions, Methods, and Discussion sections. (All VBDUs in Results sections tend to
be Text Type 3, Report of Events. There is thus little internal variation among the
VBDUs within Results sections.)
7.1 Describing the typical discourse organizations of introductions
Figure 7.6 compares the preferred text type used to begin an article introduction
versus the preferred text type for the final VBDU in the introduction. (Only 76
pairs of VBDUs are considered here, because 14 of the research articles in our
corpus had short Introductions, consisting of only a single VBDU.)
Figure 7.6 Preferred text types in Introductions, by position
Figure 7.6 shows that there is considerable variability in the text types used for
article introductions. VBDUs from Text Types 3, 4, 5, and 6 are used both to begin
an article introduction and as the last VBDU in the introduction. At the same
time, there is a difference between the two discourse positions: the preferred text
type used to begin article introductions is Text Type 5: Presentation of the current
state of knowledge. In contrast, Text Type 4: Abstract elaborated discussion is the
preferred type used to end article introductions.
That is, the most common discourse pattern for Introductions is to begin with
a summary of the current state of knowledge, and then shift to a more technical/
abstract introduction of the proposed study.3 Text Sample 7.8 illustrates a discourse
organization of this type. Notice in particular the reliance on present tense and
3. The research article described above in Table 7.4 (see Text Excerpts 7.1 and 7.6) actually
illustrates the opposite pattern: the introduction begins with a VBDU of Type 4 (Abstract discus-
sion) and then shifts to Type 5 (Current knowledge).
perfect aspect in VBDU #1, to establish what has been accomplished in this area of
research to date. Communication and discovery verbs + that-clause constructions
are also typical of this text type (e.g., proposed that, found that). In contrast,
the actual overview of the present study in the last VBDU of the Introduction
shifts to a greater reliance on technical terminology and elaborated noun phrases.
Notice the extremely dense use of long, technical terms, attributive adjectives,
nominalizations, and abstract/process nouns generally in VBDU #3.
Text Excerpt 7.8. The first and last VBDUs from the Introduction section (Conser-
vation Biology; CONSBIO06I)
VBDU 1 (the first VBDU in the Introduction)
(Text Type 5: Current Knowledge, with selected Dimension 2 features marked
in bold underlined: present tense, perfect aspect, communication/epistemic
verbs + that-clause)
Habitat fragmentation and edge effects are putative threats to population vi-
ability for a variety of wildlife species. Documented declines of some migra-
tory bird species over the past three decades (REFS) have resulted in numer-
ous studies of how fragmentation affects the nesting success of populations.
[] Numerous studies of edge effects, defined as an increased probability of
nesting failure near habitat edges, also have been conducted. However, the
impact of edges on nesting success remains unclear. [] Andersen (1995)
proposed that landscape composition, the amount of different patch types in
the landscape, might explain the variation in results of edge-effects studies
and found that edge effects in Europe are more common in forest-farmland
mosaics than in forest mosaics characterized by stands of varying ages.
VBDU 3 (the last VBDU in the Introduction).
(Text Type 4: Abstract discussion, with selected Dimension 4 features marked
in bold underlined: nominalizations, long words, abstract / process nouns,
attributive adjectives)
However, artificial nests may be subject to different predation pressures than
natural nests (REFS) and may not reflect the nesting success of an actual bird
species (REFS). Thus, critical information about how edges affect the success
of natural nests of birds in heterogeneous landscapes is still lacking. To ad-
dress these issues, we evaluated the success of 230 Wood Thrush (hylocichla
mustelina) nests in edge and interior habitats in both fragmented and con-
tiguously forested landscapes in central New York. The Wood Thrush, a Ne-
arctic Neotropical migratory songbird, is an ideal species for separating the
effects of fragmentation and edge. [] Our objectives were to (1) compare
the abundance and nesting success of Wood Thrushes in fragmented and
contiguous landscapes; (2) compare Wood Thrush abundance and nesting

success in edge and interior habitat in each landscape type; and (3) to use
actual nest data to test the hypothesis that edge effects are stronger in frag-
mented landscapes than in contiguous landscapes.
In sum, we have illustrated here the preferred pattern of discourse organization

within biology research article Introductions. In this case, many other patterns are
possible and in fact commonly occur (as in the case of the shift from an Abstract
Text Type to a Current Knowledge Text Type shown in Table 7.4 above). In con-
trast, the following section shows that Methods sections have a much more strong-
ly preferred pattern of internal discourse organization.
7.2 Describing the typical discourse organizations of methods sections
Figure 7.7 shows that Methods sections in biology articles are quite constrained in
their preferred discourse organizations: Text Type 3 (Report of Events) is strongly
preferred (66% of the time) as the first VBDU within a Methods section, while
Text Type 2 (Procedural Description) is strongly preferred as the last VBDU in this
section (67% of the time), as illustrated in Text Excerpt 7.9.
Figure 7.7 Preferred text types in Methods sections, by position

Text Excerpt 7.9. The first and last VBDUs from the Methods section (Functional
Ecology; FUNCECO05M)
VBDU 3 (the first VBDU in the Methods).
(Text Type 3: Report of Events with active voice verbs marked in bold under-
lined)
Bicyclus anynana (satyrinae) is a tropical butterfly distributed from southern
Africa to Ethiopia, which feeds on a variety of fallen and decaying fruit, in-
cluding that from Ficus trees (REF). A laboratory stock population of B. any-
nana was established at Leiden University in 1988 from over 80 gravid females
collected at a single locality in Malawi. [] Butterflies from this stock popula-
tion were used for this study. As in earlier studies (REFS), we used natural
variation in the 13c content of plants to trace the dietary sources of egg car-
bon. The 13c content is expressed as the ratio of sample 13c:12c relative to a
limestone carbon standard [FORMULA]. A more positive number indicates
an increased abundance of the heavy isotope, and is referred to as enriched.
[] By raising butterflies on isotopically contrasting larval and adult diets, we
could easily identify the dietary source of carbon in the eggs. Likewise, plants
may also differ in 15n content, depending on the sources and pathways used
in nitrogen assimilation. []
VBDU 5 (the last VBDU in the Methods)

(Text Type 2: Procedural with past tense verbs and selected Dimension 3 fea-
tures marked in bold underlined: passive voice, activity verbs, and mental
verbs)
In both laboratories, egg 13c and 15n were measured using continuous flow
isotope ratio mass spectrometry. A Ce Instruments Elemental Analyser (Mi-
lan, Italy) was used to combust and separate sample gases, which were intro-
duced into a Finnigan Delta Xl Plus isotope ratio mass spectrometer in a con-
tinuous stream of helium, via the Conflo Ii interface (thermo Finnigan,
Bremen, Germany). C were obtained from the Elemental Analyser for both
experiments; % c and % n were determined for Experiment 2 only. Statistics
Throughout, means are given 1 Se. The daily trend in mean egg number was
evaluated with linear regression. Differences in fecundity between fed and
starved females were evaluated with t tests. Differences in egg 13c per day
were evaluated with Anova or Ancova, and residuals were evaluated for nor-
mality using the Shapiro-Wilks test. Non linear fitting was performed using
least squares minimization (REF).
Methods sections in these research articles tend to follow the same discourse pro-
gression: The first VBDU is a general introduction to the study and the methodol-
ogy. This section mixes past tense and present tense verbs, and it mixes active voice
and passive voice verbs. In many cases, active voice verbs are used with direct ref-
erence to the researchers (e.g., we used, we could identify), while in other cases,
active voice verbs are used to establish the parameters of the study (a positive
number indicates, plants may also differ). In contrast, subsequent VBDUs in Meth-
ods sections tend to shift to Text Type 2, characterized by a dense and consistent
use of passive voice verbs. The researchers are fully backgrounded in these meth-
odological VBDUs, because they are assumed to be the logical subject of every
verb phrase. These verbs generally report the activities that the researchers per-
formed in the study (e.g., measured, introduced, obtained), which also often in-
volve abstract mental processes (e.g., determined, evaluated).
Similar to the preferred text type sequences in Introductions, the pattern de-
scribed here is not an absolute rule for Methods sections. However, the text type
analysis shows that there is a very strong preference for this discourse organization
within these research articles.
7.3 Describing the typical discourse organizations of discussion sections
Finally, we can use the same approach to consider the preferred sequences of text
types in Discussion sections. At a general level, Figure 7.8 shows that Discussions
follow a pattern more similar to Introductions than Methods: many different text
types are used in Discussions, occurring in both initial and final positions. How-
ever, there are certain moderate preferences.
First of all, although Text Type 1, Current evaluation of implications and expla-
nations, is not especially common overall, this text type is more likely to occur as
the last VBDU in the Discussion (14% of the time) than as the first VBDU (7%).
In contrast, Text Type 5, Current Knowledge, is the preferred type used for the first
VBDU in Discussion sections. Text Excerpt 7.7 above (see Table 7.4) illustrates
this discourse organization (Shifting from Type 5 to Type 1).
However, it is more common to begin the Discussion with Text Type 5 and then
shift to Text Type 6, Current abstract/theoretical discussion, as the final VBDU in the
Discussion section (33% of the time). Text Excerpt 7.10 illustrates this discourse
pattern. The first VBDU in the discussion section sets the stage again, reminding
readers about the larger context of the study. Present tense and perfect aspect verbs
are prevalent in this VBDU, used to state what we currently know about the topic.
The following VBDUs in the Discussion then shift to a general summary of the
study findings and the broader theoretical implications of those results. In some
cases, that discussion can be evaluative, as in Text Excerpt 7.7 above. However, more
commonly the discussion is abstract and theoretical, as in VBDU 11 in Text Ex-
cerpt 7.10. Notice in particular the use of nominalizations (e.g., correlation, interac-
tion, conclusion, resistance), which are often abstract/process nouns, as well as the
stance noun + that-clause constructions (e.g., the fact that, a hypothesis that),
and the use of long words generally (e.g., benzalkonium, ciprofloxacin). Taken to-
gether, these co-occurring features present a style of discourse that is used for con-
cluding abstract/theoretical discussions of implications.
Figure 7.8 Preferred text types in Discussions, by position
Text Excerpt 7.10. The first and last VBDUs from the Discussion section (Journal
of Applied Microbiology; JAPMICR05D)
VBDU 9 (the first VBDU in the Discussion)
(Text Type 5: Current Knowledge with selected Dimension 2 features marked
in bold underlined: present tense and perfect aspect)
The relationship between antibiotic and antimicrobial biocide (disinfectants)
resistance is currently considered a hot topic (REF). In cases where antibiotic-
resistant organisms have become a serious problem with respect to nosoco-
mial infection, the widespread use of antimicrobial biocides and the introduc-
tion (or enforced compliance) of handwashing and general hygiene measures
normally leads to its amelioration (REFS). If, as has been suggested, antimi-
crobial biocides may be aiding the incidence of the same antibiotic resistant
organisms, then there appears to be a dichotomy of reasoning. The study of
the mean Mic, Mic50 or the Mic90 values for a population does allow general
trends in the change of resistance with time to be highlighted. However, such

a study will fail to answer any questions on potential correlations of the anti-
microbials. []
VBDU 11 (the last VBDU in the Discussion)

(Text Type 6: Current Abstract/theoretical discussion with selected Dimension
4 features marked in bold underlined: nominalizations, long words, abstract
/ process nouns, attributive adjectives, noun+that complement clause)
Table 5 shows that benzalkonium chloride and ciprofloxacin are negatively
correlated but no correlation was found with gentamycin. The rotated factor,
rpc3, consists of factors from the two Qacs and is essentially devoid of any
interaction with any other antimicrobial.Although this compares well with
the study of Jones et al.(1989) who showed cross resistance between Qacs
when Ps. aeruginosa was trained to be resistant to a single Qac, the absence
of antibiotic factors and the fact that there is zero correlation between rpc1
and rpc3, suggests that, from this data, that there are no correlations between
Qacs and antibiotics. As this latter result was not expected, a study with a
larger data set with more antibiotics is needed to help confirm or change this
conclusion. From the analyses performed here it is very difficult to support a
hypothesis that increased biocide resistance is a cause of increased antibiotic
resistance either in Staph. aureus or in Ps. aeruginosa. []
8 Preferred text type sequences across research article section boundaries
The preceding sections have documented the preferred patterns of discourse de-
velopment within the sections of biology research articles. It also turns out that
there are preferred patterns across article sections: the text type of one section can
influence the choice of text type in subsequent sections.
For example, Figure 7.7 in Section 7.2 above shows that Methods sections
typically begin with a Report of Events VBDU (Type 3), followed by Procedural
VBDUs (Type 2). However, Figure 7.7 further shows that some Methods sections
(about 30% of the time) begin directly with a Type 2 (Procedural) VBDU. We can
predict this discourse choice in part by considering the text type of the final VBDU
in the Introduction. As Figure 7.9 shows, when the Introduction ends with Text
Type 5 (Current Knowledge) or Text Type 6 (Abstract Current Knowledge), it is
more likely that the Methods section will begin directly with a Procedural VBDU
(Text Type 2). (That is, only about 25% of Methods sections begin with a Proce-
dural VBDU when the Introduction ends with Report of events or an Abstract
VBDU, compared to almost 40% of Methods sections that begin with a Procedural
VBDU when the Introduction ends with Current knowledge VBDU.)
Text Excerpt 7.11 illustrates an article with this discourse pattern. It is interest-
ing to compare this text sample to Text Excerpt 7.9 in Section 7.2 above. Text Ex-
cerpt 7.9 illustrates the typical discourse organization for Methods sections in
these research articles: the first VBDU in the Methods section provides a general
overview of the study and methodology (Text Type 3), followed by the details of
the procedures (Text Type 2). In contrast, we see here the pattern where these dis-
course units have been shifted forward in the article. Thus, in Text Excerpt 7.11,
the last VBDU in the Introduction provides a general overview of the study, in the
context of the authors own previous research. Then, the first VBDU in the Meth-
ods moves directly into the details of the procedures, using Text Type 2. In this
case, we see how the choice of discourse organization in one section influences the
discourse patterns in subsequent sections.
Figure 7.9 First VBDU in Methods, following different text types used as the last VBDU
in the Introduction
Text Excerpt 7.11. The last VBDU from the Introduction section, plus first VBDU
from the Methods section (Journal of Applied Microbiology; JAPMICR06I,M)
VBDU 3 (the last VBDU in the Introduction)
(Text Type 5: Current Knowledge with selected Dimension 2 features marked
in bold underlined: present tense, epistemic verb + that-clause, communica-
tion verb, verb + to-clause)
In this study, we attempted to determine whether the beer-spoilage ability of
Lact. paracollinoides strains is an intrinsic character of this species. Our pre-
vious study indicates that Lact. paracollinoides has three distinct ribotypes,
represented by three ribopatterns obtained from La2t, La7 and La8 (REFS).
The search for these ribotypes in our database, consisting of various Lactoba-
cillus strains, showed that the ribotype of nonspoilage strain, Lact. brevis
Atcc8291, is identical with that of Lact. paracollinoides La7. This finding led
us to characterize Atcc8291 in an attempt to determine whether Lact. para-
collinoides is an intrinsic beer-spoiler. We also discussed the use of Orf5 as a
genetic marker for differentiating the beer spoilage ability of lact. paracolli-
noides.
VBDU 4 (the first VBDU in the Methods).

(Text Type 2: Procedural with past tense verbs and selected Dimension 3 fea-
tures marked in bold underlined: passive voice and activity verbs)
Bacterial strains and growth conditions Lactobacillus strains were grown
anaerobically at 25 C in MRS broth (REF). Anaerobic conditions were gener-
ated by Anaeropack (REF). Cells were stored in MRS broth containing 20%
glycerol at 80 C. Characterization of Atcc8291 Pcr assay for identifying Lact.
paracollinoides. The nucleic acid was extracted, as described previously (REF)
with the modification that 1 l glycogen (REF) was added to facilitate ethanol
precipitation of DNA. DNA (100 l) solution was prepared in 10 mmol 1 Tris
buffer (ph 8 0 ) from 1 ml cell culture. Dna ( 5 l ) extracts were subjected to
Pcr assay as templates. []
9 Comparing the preferred discourse styles of research journals
Finally, research journals differ in their preferred styles of discourse organization.

For example, we showed above how Methods sections typically begin with a Re-
port of Events VBDU (Type 3), followed by Procedural VBDUs (Type 2). However,
a relatively large minority of Methods sections begin directly with a Procedural
VBDU (about 30%). It turns out that this alternative pattern is strongly associated
with particular research journals. Table 7.5 shows that two of the journals included
in our corpus show a strong preference for this pattern: 7 of the 10 articles sampled
from the journals Clinical and Experimental Pharmacology and Physiology and
Journal of Applied Microbiology include Methods sections that begin directly with
a Procedural VBDU. Text Excerpt 7.11 in the previous section illustrates this pat-
tern. In contrast, the other eight research journals in our corpus rarely adopt this
discourse organization.
A second example of this type comes from the Discussion sections. As shown
in Section 7.3 above, a notable minority (about 15%) of biology research articles
use a Current evaluation VBDU (Type 1) to end the Discussion section (see Figure
7.8). It turns out that that pattern is especially common in one research journal:
Agricultural and Forest Entomology (5 of the 10 articles with this pattern); Text
Excerpt 7.7 in Section 4 above illustrates this discourse style. In contrast, most
other journals rarely follow this pattern.
It is interesting to note that these marked discourse patterns for Methods and
Discussion sections are in complementary distribution in our corpus: Journals
that rely on a strictly Procedural Methods section rarely end the Discussion with a
Current evaluation VBDU (see Table 7.5). Figure 7.10 shows that the journals also
differ generally in their preferred text types in Discussion sections. A comparison
of Table 7.5 and Figure 7.10 indicates that these stylistic preferences are more gen-
eral.At one extreme, the Journal of Avian Biology rarely begins Methods sections
with a procedural VBDU (only 10% of the time); about 62% of the VBDUs in
Discussion sections from this journal express current evaluation or current ab-
stract discussion. The journal Clinical and Experimental Pharmacology and Physi-
ology illustrates a quite different preferred discourse style: Methods sections usu-
ally begin with a Procedural VBDU (70% of the time), and Discussion sections rely
most heavily on Current knowledge VBDUs (Type 5; 53% of the time).
Our corpus sample for each of these academic journals is small (only 10 arti-
cles per journal), and thus minor differences in the discourse patterns across jour-
nals must be interpreted with caution. However, the findings here indicate large
and systematic differences in the preferred discourse patterns of different research
journals, probably reflecting their more general communicative purposes and pri-
orities. Future research of this type, based on larger corpus samples, is needed to
provide details of these patterns.
Table 7.5 Stylistic differences across research journals: Proportion of articles from each
journal with a marked discourse preference
Research Journals % of Methods % of Discussion

sections beginning sections ending
w/ Procedural w/ Evaluation
VBDU (Type 2) VBDU (Type 1)
Agricultural and Forest Entomology 0 50

Journal of Avian Biology 10 30
Conservation Biology 0 10
Functional Ecology 10 10
Annals of Human Genetics 20 10
International Journal of Plant Sciences 20 10
Journal of Medical Primatology 30 20
Journal of Anatomy 30 0
Journal of Applied Microbiology 70 0
Clinical and Experimental 70 0
Pharmacology and Physiology
Figure 7.10 Stylistic variation across four research journals: Preferred text types in Dis-
cussion sections
10 Conclusion
The present chapter has illustrated the bottom-up approach to integrating the
strengths and goals of corpus analysis and discourse analysis. This approach allows
the consideration of the internal discourse structure of individual texts based on
generalizable units of analysis identified through empirical analysis of a large cor-
pus. Specifically, we used corpus-based analysis to identify and interpret the types
of discourse units commonly found in a corpus of biology research articles. We
then showed how that analysis could be applied to describe the internal discourse
organization of particular articles and sections, as well as the more generalizable
organizational patterns of research articles and different journals.
The research approach described here has several advantages: Most impor-
tantly, the approach relies on empirical analysis of a large corpus, and on compre-
hensive linguistic analysis of the discourse units. The results are therefore replica-
ble, and the findings can be interpreted as generalizable patterns that are
representative of this genre. When an individual text is analyzed in terms of these
constructs, the discourse patterns are readily comparable to the analyses of other
texts from the same genre. As a result, these constructs can be applied to identify
the generally preferred discourse patterns of a genre.
At the same time, the approach has disadvantages. Most importantly, the dis-
course units themselves are identified primarily on the basis of word use, disre-
garding more subtle signals of a shift in purpose that might be noticed by a human
analyst. Thus, we would certainly not argue that corpus-based analysis of discourse
unit types should replace more conventional discourse analytic techniques. How-
ever, we hope that the present study has shown the usefulness of a bottom-up
corpus-based approach to discourse.
Table 7A Statistical output from the factor analysis of biology research articles:
Eigenvalues for the first six factors
Factor Eigenvalue Difference Proportion Cumulative
1 4.81539474 2.82633326 0.1267 0.1267

2 1.98906147 0.05942658 0.0523 0.1791
3 1.92963489 0.29061590 0.0508 0.2298
4 1.63901899 0.13399396 0.0431 0.2730
5 1.50502503 0.18690466 0.0396 0.3126
6 1.31812037 0.01730357 0.0347 0.3473
Table 7B Statistical output from the factor analysis of biology research articles:
Inter-factor correlations
Factor 1 Factor 2 Factor 3 Factor 4
Factor 1 1.00000 0.33704 0.28207 0.16770

Factor 2 0.33704 1.00000 0.23946 0.21578
Factor 3 0.28207 0.23946 1.00000 0.12994
Factor 4 0.16770 0.21578 0.12994 1.00000
Table 7C Statistical output from the factor analysis of biology research articles:
Rotated factor pattern (Promax rotation)
Features with high loadings on Factor 1:

predicative adjectives 0.59725 0.21374 0.18694 0.08250
copula be 0.53596 0.02230 0.06724 0.18825
adjective + to complement clause 0.53167 0.04448 0.09421 0.04828
adverbs 0.49067 0.03961 0.14722 0.15303
prediction modals 0.47859 0.07542 0.03827 0.02015
conjuncts 0.42739 0.16989 0.05834 0.03781
possibility modals 0.41064 0.20298 0.03595 0.20525
causative adverbial clauses 0.40932 0.06489 0.04717 0.09738
conditional adverbial clauses 0.36147 0.11082 0.10658 0.02544
adjective + that complement clause 0.33134 0.06686 0.06864 0.01815
pronoun it 0.31604 0.31504 0.05425 0.12020
first person pronouns 0.30642 0.10662 0.13475 0.18895
third person pronouns 0.26277 0.02551 0.17277 0.08850
necessity modals 0.25684 0.09208 0.05455 0.18069
nouns 0.50186 0.27915 0.03416 0.21725

communication verbs 0.10369 0.59439 0.28658 0.02054
communication verb + that complement 0.06108 0.55623 0.01099 0.03151
clause
present tense 0.26291 0.52945 0.29089 0.09118
perfect aspect 0.12455 0.52518 0.05667 0.04949
epistemic verb + that complement clause 0.09052 0.44476 0.07416 0.00740
relative clauses 0.01880 0.35085 0.14433 0.03247
demonstrative pronouns 0.12527 0.32147 0.05153 0.08789
verb + to complement clause 0.09671 0.28786 0.12872 0.09367
past tense 0.11086 0.45245 0.53318 0.14306
clausal coordination 0.23437 0.35013 0.22316 0.09040
concrete nouns 0.12157 0.19192 0.12353 0.08479

passive voice 0.11597 0.00543 0.68231 0.14811
activity verbs 0.05540 0.00949 0.65759 0.05539
mental verbs 0.29240 0.08361 0.43665 0.11977
progressive aspect 0.05395 0.05294 0.42232 0.11338
time adverbials 0.04528 0.22996 0.37108 0.24231

nominalizations 0.00750 0.16825 0.05314 0.72610
word length 0.01948 0.05385 0.06320 0.62795
abstract nouns 0.00163 0.04929 0.10624 0.47413
process nouns 0.13612 0.09724 0.04659 0.45618
cognitive nouns 0.06819 0.04338 0.33553 0.40226
attributive adjectives 0.04460 0.01662 0.36044 0.37436
noun + that complement clause 0.06473 0.18228 0.11090 0.24678
Table 7D Cluster summary
Cluster Freq. RMS Std Max. Distance Nearest Distance Between

Deviation from Seed Cluster Cluster Centroids
to Observation
1 73 3.1 12.7 5 7.8

2 265 2.7 14.1 3 6.9
3 538 2.5 11.1 2 6.9
4 180 2.5 12.0 6 6.4
5 249 2.7 13.0 6 6.6
6 198 2.7 12.6 4 6.4
Table 7E Descriptive statistics for the dimension scores of clusters.

[Especially large dimension scores are shown in bold.]
Dimension 1 Dimension 2 Dimension 3 Dimension 4
Means SD Means SD Means SD Means SD
Cluster 1 8.18 2.81 3.93 3.47 1.18 2.98 0.58 2.96

Cluster 2 1.91 1.99 4.09 2.70 5.94 3.01 0.59 3.05
Cluster 3 0.97 2.00 2.91 2.94 0.55 2.26 2.26 2.51
Cluster 4 0.45 2.05 0.86 2.91 4.43 2.13 3.03 2.71
Cluster 5 1.21 2.35 7.35 3.32 1.32 2.46 0.55 2.40
Cluster 6 1.07 2.26 3.47 2.64 0.31 2.82 4.66 2.89
Table 7F Distribution of VBDUs across clusters (MD discourse types)

& research article sections
Research Article Sections*

Introduction Methods Results Discussion Total
Cluster 1
Frequency 6 2 7 58 73
Percent 0.40 0.13 0.47 3.86 4.86
Row % 8.22 2.74 9.59 79.45
Column % 2.52 0.47 1.84 12.66
Cluster 2
Frequency 5 199 53 8 265
Percent 0.33 13.24 3.53 0.53 17.63
Row % 1.89 75.09 20.00 3.02
Column % 2.10 46.71 13.91 1.75
Cluster 3
Frequency 48 199 225 66 538
Percent 3.19 13.24 14.97 4.39 35.80
Row % 8.92 36.99 41.82 12.27
Column % 20.17 46.71 59.06 14.41
Cluster 4
Frequency 59 10 46 65 180
Percent 3.93 0.67 3.06 4.32 11.98
Row % 32.78 5.56 25.56 36.11
Column % 24.79 2.35 12.07 14.19
Cluster 5
Frequency 74 1 34 140 249
Percent 4.92 0.07 2.26 9.31 16.57
Row % 29.72 0.40 13.65 56.22
Column % 31.09 0.23 8.92 30.57
Cluster 6
Frequency 46 15 16 121 198
Percent 3.06 1.00 1.06 8.05 13.17
Row % 23.23 7.58 8.08 61.11
Column % 19.33 3.52 4.20 26.42
Total
Frequency 238 426 381 458 1503
Percent 15.83 28.34 25.35 30.47 100.00
* Clusters 16 = multi-dimensional discourse type
chapter 8
Vocabulary-based discourse units in

university class sessions
BY Eniko Csomay
During the past four decades, classroom talk has been investigated from
multiple perspectives. Several studies have looked at the general structural and
interactional patterns of individual class sessions (Cazden, 1986; Chaudron,
1988; Long & Sato, 1983; Mehan, 1979; Sinclair & Coulthard, 1975), including
foreign language classrooms. More recently, the socio-cultural aspects of
classroom interaction in K-12 settings have also been investigated (Poole, 2005;
Wells, 1999).
Linguistic studies of academic lectures reflect the general interest in the
lexical, rhetorical, and topical structures of discourse. For example, Nattinger
and DeCarrico (1992) identify recurring multi-word sequences (lexical phrases)
in lectures, classifying them into global or local discourse organizers depending
on their discourse functions. As for rhetorical patterns, disciplinary differences
are described by identifying coherent sub-units in lectures (e.g., phases by
Young, 1994) or through the discourse organization of lectures in varying
disciplines (e.g., Dudley-Evans, 1994b). Hansen (1994) concludes that tracing
discourse markers that signal topic shifts is a most useful method in determining
topic shifts in lectures.
With the availability of large collections of texts, corpus-based linguistic
studies on spoken academic language use have also emerged. Two broad
approaches have been pursued. One approach uses corpora to provide
comprehensive linguistic descriptions of language use in academic contexts
(see. e.g., Biber, 2003). In these studies, the linguistic characteristics of class
sessions have been compared to those of other academic and non-academic
registers (e.g., textbooks, Biber, Conrad, Reppen, Byrd, & Helt, 2002; face-to-face
conversation, Csomay, 2006). A second approach has been taken by scholars
who look at individual linguistic features and discuss their functional variants
across a number of contexts, such as university class sessions (e.g., pronouns by
Fortanet, 2004; reflexivity by Mauranen, 2001; idioms by Simpson & Mendis,
2003; evaluative adjectives by Swales & Burke, 2003).
Relatively few previous studies have investigated the discourse organization
of academic class sessions (e.g., Young, 1994 and Dudley-Evans, 1994, noted
above), and none of these have been based on large-scale corpus analysis. As a
result, these previous studies do not address the major research goals of bottom-
up corpus-based studies of discourse organization: to provide a comprehensive
linguistic description of discourse units and the flow of discourse within texts,
and to describe generalizable patterns of discourse organization that hold across
all texts of the target corpus (see Chapter 6). The present chapter complements
previous studies by adopting a bottom-up corpus-based approach, motivated by
these two research goals (see also Csomay (2002; 2005a; 2005b).
1 From constructing a corpus of VBDUs to identifying VBDU text-types
The study of classroom discourse reported in the present chapter follows the same
methodological steps as in Chapters 67:
Step 1: Construct a corpus of VBDUs, by segmenting complete classroom
teaching sessions into smaller discourse units;
Step 2: Analyze the linguistic characteristics of classroom VBDUs applying
multi-dimensional analysis techniques;
Step 3: Identify and interpret VBDU types via cluster analysis;
Step 4: Analyze classroom teaching sessions as sequences of VBDUs and
VBDU types.
1.1 Constructing a corpus of VBDUs
The study here is based on analysis of the classroom teaching texts included in the
T2K-SWAL Corpus (see Biber et al. 2004, Biber 2006b), supplemented by a few
class sessions from the MICASE Corpus. The corpus includes class sessions from
six major academic disciplines and three levels of instruction (lower division un-
dergraduate, upper division undergraduate, and graduate). The sub-corpus of 196
class sessions was segmented into Vocabulary-Based Discourse Units (VBDUs)
using the techniques described in Chapter 6. Table 8.1 shows the breakdown of
VBDUs across academic disciplines.
Only VBDUs longer than one hundred words were considered for further lin-
guistic analyses. The average VBDU length was 231 words, ranging from a mini-
mum of 101 words to a maximum of 1031 words. As Table 8.1 shows, a total of
5,847 VBDUs of 100+ words were included in the analysis.
Chapter 8. Vocabulary-based discourse units in university class sessions
Table 8.1 Breakdown of class sessions and vocabulary-based discourse units by discipline
Discipline Number of class Number Number

sessions of words of VBDUs
Business 36 236,400 1,021

Education 16 137,200 565
Engineering 35 210,900 941
Humanities 39 300,200 1,214
Natural Sciences 31 219,000 901
Social Sciences 39 294,400 1,205
Total 196 1,398,100 5,847
1.2 Analyzing the linguistic characteristics of VBDUs applying

MD analytical techniques
Step 2 identified above is to analyze the linguistic characteristics of classroom VB-

DUs through a multi-dimensional (MD) analysis. That is, as in the preceding
chapters, dimensions of variation are identified through a factor analysis, based
on the distribution of c. 100 linguistic variables across the 5,847 VBDUs from the
classroom teaching corpus.
In preparation for the multi-dimensional analysis, a set of lexico-grammatical
and interactional features were identified as potentially important. Besides the lin-
guistic features found to be important in earlier MD studies of academic language
use (Biber, 2003; Biber et al., 2004), five interactional features were also included
in the analysis (see also Csomay, 2005b): the number of turns, for teachers versus
students; and the average turn length, for teachers versus students. Turns were
defined adopting Taos broad definition: any speaker change will be treated as a
new turn (2003: 189). In addition, the analysis included a count of 1-word turns,
such as right or good.
The final factor analysis in this case has three factors, with 41 of these linguis-
tic variables being retained. Table 8.2 shows the linguistic features co-occurring on
each factor.
Table 8.2 Dimensions of variation identified in university classroom talk

(Csomay 2005b)
Dimension 1 Contextual orientation versus Conceptual, informative focus
Non-past tense .780
First and second person pronoun .570
Modals .502
Non-passive constructions .499
Contractions .452
Third person pronoun (reduced forms) .410
Adverbial clauses: conditional .371
Verb-initial lexical bundles .370
Action verbs in directive forms .365
Commonly used vocabulary .346
Activity verbs .307
Word length .620
Nouns .523
Past tense .424
Prepositions .409
Attributive adjectives .384
Words used in one lecture only .350
Nominalization .323
Dimension 2 Personalized framing
That deletion .777
Mental verbs .724
Factual verbs with that .621
I mean .570
You know .544
(Non-passive constructions .482)*
(Past tense .422)
Likelihood verbs with that .413
Third person pronoun (he, she, they) .369
no negative features
Dimension 3 Interactive dialogue versus Teacher monologue
Student turns .824
Teacher turns .760
One word turns .564
Discourse particles .456
Turn length (teacher) -.611
* The features in parentheses appear on more than one factor with a loading over the set
threshold of .3.
Dimension 1 is interpreted as Contextual orientation versus Conceptual, informa-

tive focus. Most of the features on the positive side of Dimension 1 are associated
with a context-dependent interaction, such as first and second person pronouns,
contractions, third person pronouns, activity verbs, commonly used words, and
conditional clauses. This side of Dimension 1 marks interactive instructional dis-
course where participants are actively involved and where reference is made to the
immediate context. In contrast, the features co-occurring on the negative side of
Dimension 1 (e.g., nouns, prepositional phrases, attributive adjectives) are associ-
ated with a conceptual, informational focus. Overall, this dimension is consistent
with Bibers (1988, 1995) first dimension of variation across speech and writing,
Informational versus Involved production circumstances; it also supports previous
studies that have found that university classroom talk exhibits features of both
face-to-face conversation and academic prose (Csomay 2006).
Dimension 2 Personalized framing reflects the way in which participants
formulate their ideas on-line, overtly expressing their own personal perspective
that frames statements. For example, mental verbs (e.g., think, know, guess) ex-
press the personal state of mind. Many of these verbs are used with that-clauses,
where the complementizer that is often omitted, and the controlling verb express-
es the personal framing relative to the information in the that-clause. On Dimen-
sion 2, we find especially factual verbs (e.g., know, mean, realize) with that-clauses.
These are often used to identify the shared background knowledge of a class (e.g.,
I mean we know its going to be X plus one). Likelihood verbs controlling that-
clauses also co-occur on Dimension 2, as in:
but I certainly dont think [0] these data indicate that we are more lenient to
women uh but I also think [0] they raise a serious question about whether or not
were more punitive to them too.
Finally, the co-occurring features on Dimension 3 are interpreted as reflecting In-

teractive dialogue versus Teacher monologue. There are few features on this dimen-
sion. The positive features reflect intense turn-taking patterns and discourse parti-
cles. In contrast, the negative side of this dimension is characterized by long
teacher turns, indicating a more monologic style of presentation.
1.3 VBDUs and dimension scores: the multi-dimensional profile

of the first three VBDUs of a business management class
By calculating the dimension scores for each VBDU, we are able to determine the
extent to which each discourse unit relies on the co-occurring linguistic features
associated with each dimension. For example, Figure 8.1 below displays the multi-
dimensional profile of the first three VBDUs of an upper division Business Man-
agement class.
VBDUs and the three dimensions of linguistic variation in university classroom talk
Dimension 3 Dimension 2 Dimension 1
5,98
1 -3,83
4,95
-0,12
VBDU sequence
2
-1,52
0,84
-8,84
3 -2,47
0,64
-10 -8 -6 -4 -2 0 2 4 6 8 10
Dimension scores
Figure 8.1 VBDUs and the three dimensions of linguistic variation in university class-
room talk in an upper division Business Management class session
The following text examples illustrate how lexico-grammatical and interactional

features function in these three VBDUs, representing the opening discourse of a
classroom teaching session. Figure 8.1 shows that VBDU 1 is marked by large
positive scores on Dimension 1 and Dimension 3, and a large negative score on
Dimension 2. In Text 8.1 below, the linguistic features on the positive side of Di-
mension 1 (e.g., first and second person pronouns, modals, verbs in non-past
tense) are bold italicized. The negative Dimension 2 score of this VBDU corre-
sponds to the relative absence of personal framing features; thus there are no Di-
mension 2 features to highlight in Text 8.1. Finally, the large positive score for
Dimension 3 reflects the large number of different turns in this VBDU, by both
students and teachers.
Text 8.1: VBDU 1 from a business classroom teaching session

(Positive Dimension 1 features marked in bold italics)
Teacher: to summarize and give the key points in the hot stove rule for us on
page three twenty one. um somebody volunteer to do that start
looking it up so I
Student: hot stove page three twenty one?
Teacher: yeah. youve been real good about doing it but you
Student: oh, oh somebody else can do it.
Teacher: well Im going to say I dont mind youre doing a nice job but we
ought to give some of these other brilliant folks a chance. anybody
want to summarize the hot stove rule for us?.. Ok. you do it. I dont
want to make anybody do it.
Student: alright. page three twenty one.
Teacher: yeah.... we had a I know one answer marked wrong on my crib sheet
I gave you um [xxx] so Im going to save about six or eight minutes
to review that uh so um nobody else give me a paper to correct and
well [xxx] at the end of the class.
In this first VBDU, the teacher starts off by prompting students to summarize key
points from an earlier discussion. Direct reference is made to the context by refer-
ring to the speaker and addressee (I and you) and to specific pages in the textbook.
The discourse is also highly interactive. This VBDU functions primarily to address
class and instructional management issues.
The second VBDU in this class session uses features from both poles of Di-
mension 1, but few Dimension 2 features. This VBDU has one long teacher turn,
followed by several shorter turns by both students and the teacher. In this VBDU,
the teacher outlines the structure of the present class session. Then he introduces
the first topic for the day, evoking students background knowledge through a
question-answer sequence on what the content of the chapter is.
Text 8.2: VBDU 2
(Positive Dimension 1 features are marked by bold italics; negative Dimen-
sion 1 features are marked by bold.)
Teacher: and uh usually I stick pretty close to the text in going through
the lesson somewhere along the line some people have said that
helps them follow the material and notes but today theres a lot of
material in that chapter that sort of clutters it up. its more clut-
tered and worthwhile so Im going to while an abbreviated lecture
notes that I have and I will skip quite a few of those what I think are
less important items in there we will have a brilliant dissertation
on the hot stove rule and then we will take uh six eight minutes to
talk about the case and uh which I think has some value uh with
that are we about ready to go? ok. what whats our chapter about
today?
Student: managing conflict and stress.
Teacher: and the text gives us a definition of conflict which may be a little
different than the one that when your big brother used to beat you
up you thought about conflict. um what does the text say about
conflict, what it is? somebody.
Student: overt behavior that results when an individual or group of indi-
viduals thinks they perceive need or needs of the individual or
group has been blocked or is about to be blocked.
Teacher: first of all its overt behavior. What do we mean by overt behav-
ior?
Student: obvious
Teacher: pardon?
Student: obvious
Teacher: obvious or real.its just real behavior, uh and it happens when an
individual or groups of individuals think what do they think or
perceive?
Finally, the third VBDU has a large negative score on Dimension 1, a negative
score on Dimension 2, and a score near 0.0 on Dimension 3 (see Figure 8.1). Text
8.3 highlights the dense use of negative Dimension 1 features, including nouns,
attributive adjectives, and prepositional phrases. In this case the instructor takes
several longer turns, presenting course content, interspersed with short responses
from students.
Text 8.3: VBDU 3
(Positive Dimension 1 features are marked by bold italics; negative Dimen-
sion 1 features are marked by bold; positive Dimension 2 features are CAPI-
TALIZED; positive Dimension 3 features are marked by underlined italics;
negative Dimension 3 features are underlined)
Student: things have been blocked from them.

Teacher: that something that they think is important something is
blocking them from obtaining that need of what they desire.
The text says uh well we just talked about it uh when a perceived
need has been blocked made unobtainable apparently unobtain-
able. and why do conflicts arise? what causes conflicts to occur?
Student: uh disagreements.
Teacher: Ok.... its real simple. different people have different perceptions of
their needs and uh their beliefs and goals and uh these differ-
ent perceptions generate these conflicts. uh out in the business
world and this was particularly true uh according to my observa-
tion with my thirty some years working with a large corporation
several of them, uh that managers have some beliefs about con-
flict and theyre quite often uh dont true up with what our text
describes. what are some of the common beliefs about conflict?
Student: [xxx]
Teacher: thats the main thing. many managers, particularly managers feel
like (that) by gosh you just shouldnt have a lot of conflict in your
organization, and many of these managers say Im just not going to
put up with a lot of conflict. they try to squash it and keep it uh
undercover and keep it down. well, thank goodness the idea about
conflict being so bad is in the process of changing. they also
think that conflict results from personality problems. you
wouldnt have conflicts in your group or your organization if you
didnt have a bunch of people uh that are less than uh normal dont
have their head screwed on right. and conflict uh produces inap-
propriate uh reactions of other people involved.
To summarize, this section illustrates how VBDUs can differ in their dimension
scores, specifically comparing the first three VBDUs from an upper division Busi-
ness Management class. In previous research, Csomay (2005b) describes how the
first VBDUs in a class session are often similar to one another in their linguistic
characteristics. In the present case, though, we see relatively large differences
across these VBDUs, as the instructor shifts from class management tasks to ac-
tual course content more quickly than in many class sessions.
2 Dimension scores and VBDU text-types
As in the preceding chapters, cluster analysis is used next to identify groupings of

VBDUs with similar linguistic profiles the VBDU Types. Each VBDU has a spe-
cific characterization within the three-dimensional space created by the dimen-
sion scores. Figure 8.2 illustrates this in a three-dimensional scatter plot. Cluster
analysis1 produces groupings of VBDUs based on two major considerations: 1) the

VBDUs grouped into a cluster are maximally similar to one another in their lin-
guistic characteristics in this 3-dimensional space, and 2) the different clusters are
maximally different from one another. In the present case, the solution for four
clusters best captured the patterns of variation among these VBDUs, and Figure
8.2 shows the 3-dimensional linguistic characterization of those four clusters.
Two of the three dimensions are particularly strong in determining which
VBDU falls into which cluster. Dimension 1, Contextual orientation versus con-
ceptual, informational focus, is the most powerful predictor (R squared: 0.7759),
while Dimension 2, Personalized framing is also a strong predictor (R squared:
0.4024). Dimension 3, Interactive dialogue versus Teacher monologue has the least
effect (R squared: 0.0636) in determining cluster membership.
Centroid measures help to determine the typical linguistic characteristics of
each of the four clusters. Centroids are mean scores, reflecting the central charac-
terizations of each cluster with respect to each dimension (Biber, 1995. p. 323).
Table 8.3 provides descriptive statistics for the centroid measures of each clus-
ter. The clusters are roughly the same size, with c. 1200 1700 VBDUs being
grouped into each one.
1. The statistical cluster analytical procedures were run in SAS, a statistical computer pro-
gram. To carry out the cluster analytical procedures, FASTCLUS, a type of partitioning techni-
que that allows the classes to be mutually exclusive was run on the nearly 6000 VBDUs, with the
three dimension scores as predictors. In order to find the optimal number of clusters, resear-
chers normally run a set of analyses with varying clusters while closely monitoring the rela-
tionship between the value of the clustering criterion and the number of variables. In our case,
the number of variables is 3, since we have three dimension scores for each discourse unit. As
Everitt (1974) explains
It is generally suggested that a plot of the criterion value against the number of groups
will indicate the correct number [of clusters] to consider by showing a sharp increase (or
decrease, if the criterion is being minimized), at the correct number of groups. (p. 59)
Solutions of 3, 4, and 5 clusters were run and the data was inspected. A sharp decrease was
shown in of the cubic clustering criterion between solutions 5 and 4 while no major change was
shown between solutions 3 and 4. Hence, in the light of Everitts observation above, a 4 cluster
solution was applied in the present study. This solution grouped those VBDUs together that
share linguistic characteristics, while the clusters themselves remained maximally distinct.
Figure 8.2 Clusters plotted in a three-dimensional space represented by the three dimen-
sions of variation
VBDUs falling into Cluster 1 have the highest use of features related to positive
scores on Dimension 2, reflecting personalized, narrative kinds of discourse. At
the same time, Cluster 1 has the lowest positive score on both Dimension 1 and
Dimension 3, showing a moderate use of contextual features as well as moderately
dialogic discourse.
VBDUs falling into Cluster 2 have extremely large negative scores on Dimen-
sion 1, reflecting a very low number of linguistic features associated with a contex-
tual orientation, and a dense use of features associated with an informational fo-
cus. The VBDUs in this cluster also lack linguistic features associated with
personalized framing (a negative Dimension 2 score), and have a large negative
Dimension 3 score, reflecting long teacher monologues.
Table 8.3 Centroid measures for the four clusters
Variable N Mean Std Dev Minimum Maximum
Cluster 1
Dimension 1 1315 0.90 4.38 11.77 20.08
Dimension 2 1315 5.63 5.54 2.58 36.31
Dimension 3 1315 0.06 3.28 4.64 14.89
Cluster 2
Dimension 1 1178 12.49 4.68 42.14 5.93
Dimension 2 1178 2.74 3.25 9.16 8.83
Dimension 3 1178 1.75 2.12 4.75 10.46
Cluster 3
Dimension 1 1654 10.16 4.66 3.20 37.52
Dimension 2 1654 0.57 4.79 6.95 35.08
Dimension 3 1654 0.83 4.06 4.63 23.47
Cluster 4
Dimension 1 1700 1.88 3.21 10.74 6.63
Dimension 2 1700 3.03 2.21 7.84 6.89
Dimension 3 1700 0.37 3.96 4.69 33.12
In contrast, Cluster 3 has the highest positive Dimension 1 score and the highest
positive Dimension 3 score. Hence, VBDUs in this cluster exhibit linguistic fea-
tures associated with a dialogic type of discourse and a strong contextual orienta-
tion. Finally, Cluster 4 is characterized by negative features on both Dimension 1
and Dimension 2, and a score near 0.0 on Dimension 3.
2.1 Interpreting the clusters as VBDU types based on

their linguistic characteristics
In Section 1.2, it was shown that each VBDU is characterized by all three dimen-
sions, and as described in Table 8.3 above, each cluster can also be characterized
by scores on the three dimensions. It is further possible to plot the centroids of
each cluster, providing an overall multi-dimensional profile of the VBDU-types, as
in Figure 8.3. Based on these linguistic profiles, we are able to propose interpretive
labels for each VBDU Type: Personalized framing (Cluster 1), Informational mono-
logue (Cluster 2), Contextual interactive (Cluster 3), and Unmarked (Cluster 4).
Figure 8.3 Four types of VBDUs (Personalized framing, Informational monologue, Con-
textual interactive, and Unmarked) on three dimensions
2.1.1 Cluster 1: Personalized framing

VBDUs in Cluster 1 typically have extremely large Dimension 2 scores (Personal-
ized framing), but unmarked scores on Dimensions 1 and 3. Hence, this VBDU
type exhibits by far the most linguistic features associated with the personal fram-
ing of the speakers ideas, using features like factual verbs and likelihood verbs
controlling that-clauses. Text Excerpt 8.4 illustrates this VBDU type.
Text Excerpt 8.4. Example of VBDU Type Personalized framing (Cluster 1)
(Positive Dimension 2 linguistic features are in Capitalized italics:
that-complement clauses, factual and likelihood verbs, mental verbs, you
know, I mean, and third person animate pronouns.)
Student: Yeah, kind of [laughter]

Teacher: you dont you dont believe that he did that?
Student: No I mean I dont really want to take it as a I dont know I
mean I take it for what it is which is a further mythologizing of the
great Thomas Edison as one of the great men and the great I dont
KNOW I mean he says Im sure the story has been embellished
throughout time and I THINK yeah thats the thing I underlined
and I MEAN I dont KNOW Im sorry keep going
Student: yeah
Student: no I
Teacher: but the point is what that story should (hear what...that storys)
about communication
Student: right I I KNOW
Teacher: Now not that it shows about Edison per say about Edisons genes,
Student: right
Teacher: you YOU KNOW, or genius but what it says about genius as it was
it constructed in these communications networks
Student: But I wanted more of that
Personalized framing VBDUs reflect framing not only from structural and seman-
tic perspectives (e.g., verbs controlling that- complement clauses or mental verbs,
respectively) but also from an interactional perspective. In addition to expressing
ideas, two specific cognitive verbs (think and know) also show interactional fram-
ing functions. In these instances, they appear as fixed expressions such as, I mean
and you know, or are often used together (I mean you know or you know I mean)
constituting frequently occurring four-word combinations (lexical bundles; see
Biber et al.(2004). Either together or simply repeated separately, these combina-
tions provide time for the speaker to hold the floor in the interaction while they
are (re)formulating their ideas or expressing their opinions on line.
In Text 8.4, the teacher challenges the students statement on the given topic,
prompting the student to think about his interpretation of the text under discus-
sion. Accordingly, the student reformulates his thoughts to restate his position,
which is then supported by the teacher.
Besides clarifying ideas and expressing opinions through reformulation, as we
have seen in the extract above, further communicative functions are apparent in
units that exhibit language associated with Personalized framing. For example,
when the frequent use of third person pronouns and past tense co-occur, they are
most commonly associated with personal narratives. In the extract below, third
person pronouns and past tense co-occur with conditional clauses. These three fea-
tures together signal hypothetical past, suggesting an interpretative function of this
segment. Other features from the two dimensions (e.g., modals) are also marked.
Text Excerpt 8.5. Example of VBDU Type Personalized framing (Cluster 1) show-
ing narrative features
(Positive Dimension 1 features are marked by bold italics. Positive Dimension
2 features are marked by CAPITALIZED ITALICS.)
Student: To me if she didnt know, what the radiation was doing to her,
then her power that she possessed wouldnt hold true. I
mean supposedly why she had power was because she was do-
ing something for the good of mankind. Um, if she truly didnt
know what was going on that leads me to think that, she it

was more of an accident that she found out these things.
Teacher: Mhm
Student: And if its an accident you truly dont (exert) a lot of power.
Teacher: Ok.
Student: She did make a sacrifice but she didnt know she was doing it.
Student: Exactly, exactly.
Teacher: The poet seems to think she was denying these things, for
some reason. Maybe she was aware but she denied.. refused to
accept it.
Student: Yeah. Yeah I agree with that. I mean thats what I was thinking
like like she suspected it, but she wasnt gonna admit it to any-
body or, profess it to the world. I mean she was gonna go on with
what she was doing.
Teacher: Mhm. I mean she was a scientist after all.
Student: Right.
Teacher: She can look at the cause effect relationship.
VBDUs with a narrative tone potentially serve multiple communicative, instruc-

tional, and discourse purposes in a classroom context. For example, they are used
to raise attention or to create background to an immediately forthcoming point, to
expand on a previously made major point, or to provide a niche to interpret a pre-
viously introduced text or other instructional materials. This interpretative focus
is highlighted in Text Excerpt 8.5.
2.1.2 Cluster 2: Informational monologue

Informational monologue VBDUs typically have large negative scores on all three
dimensions, meaning that they are highly informational (Dimension 1), marked
by the absence of personal framing features (Dimension 2), and primarily mono-
logic with long turns taken by the teacher (Dimension 3). Text Extract 8.6 illus-
trates this VBDU type.
Text Excerpt 8.6. Example of VBDU Type Informational monologue (Cluster 2)
(Negative Dimension 1 features are in bold. Note also the lack of the linguistic
features from positive Dimension 2 which shows an impersonal style, and the
long teacher turn reflecting monologic discourse, which is a feature of nega-
tive Dimension 3)
Teacher: Okay. All righty um what I want to do is continue with this discus-
sion that weve been trying to show, between the interaction of his-
tory and, language change, and again as I state were using the Ro-
mance languages as sort of our test case, because we have an

abundant documentation of the situation both of the historical
development of the Romance languages the historical background
of real world events which occurred from the start of the Roman
Empire right up through the fall of the Roman Empire, and the
ultimate uh fate of the various provinces of the Roman Empire,
and we also have an abundant corpus of linguistic documenta-
tion. so to a large extent the Romance languages present an ideal
case, uh for studying, the development of language against the
background of history. or conversely how history affects language.
one of the issues that I want to particularly concentrate on today is
the issue of the linguistic, uh impact of language contact. one of
the main historical themes that Ive been stressing, throughout the
last uh couple of classes has been that in the history of (thin) the
linguistic history of the Roman Empire, weved movements of
peoples. we first of all have the expansion of the Romans. as the
Romans left Rome and over the course of several centuries ex-
panded their territorial domain, to what was to become the Ro-
man Empire which at its height, stretched from Ireland all the way
in uh through west bowell most of uh central western and eastern
Europe and through the Mediterranean basin both the north and
the south shore the Mediterranean, and beyond into Asia Minor.
In this segment, the teacher summarizes the major theme of a learning unit for the
given class session: the interaction between history and language change. VBDUs
from this type serve purposes such as academic reporting, reading written text out
loud, listing, and so on.
2.1.3 Cluster 3: Contextual interactive

Contextual interactive VBDUs have the highest positive scores on Dimension 1
(Contextual orientation vs. Conceptual, informational focus) and Dimension 3
(Interactive dialogue vs. Teacher monologue), as well as a small positive score on
Dimension 2 (Personalized framing). A Contextual interactive VBDU exhibits lin-
guistic and interactive features that are associated with texts reflecting high par-
ticipant involvement. As illustrated in Text Excerpt 8.7, references are made to the
immediate spatial or mental context as a hypothetical situation is discussed while
the participants are involved in a dialogue.
Text Excerpt 8.7. Example of VBDU Type Contextual interactive (Cluster 3)
(Positive Dimension 1 features are marked by bold italics. Positive Dimension
2 features are marked by CAPITALIZED ITALICS.)
Teacher: . what else would possibly happen? yes?

Student: they could go insane. like shoot up the building
Teacher: some people get so stressed they come in and shoot up the building.
and sometimes they shoot up the people and sometimes they kill
people. and that s a certainly a very critical ultimate sign of stress.
right? Ok. gosh the post office two or three years ago boy if I saw
a post office worker in the post office building I would said I would
(bug) out of there because we had two or three didnt we? yeah?
Student: but you know well I notice in my office when people are having
conflict or whatever they do start to lay out more and call out sick
a little bit more.
Teacher: they do what?
Student: if you know theyre having some type of conflict at work they start
calling out. they dont come in and work you know
Teacher: oh they report out. absenteeism
Student: mhm
Teacher: or tardiness certainly. um if you dread to go to work, (whats follows
up a natural um
Student: it wouldnt take much to
Student: yeah
Student: just a sniffle or
Student: yeah
Teacher: thats right you make an excuse not to come to work (to) put your-
self in a situation thats not comfortable. Ok.
In this text excerpt, a concept is discussed through exemplification. First the

teacher asks the students to think about a hypothetical scenario. After making a
general statement on the given topic the teacher gives a concrete example of an
incident relating to the topic. The students follow up on the concrete example with
more examples from their own context that they characterize even further through
a dialogue.
2.1.4 Cluster 4: Unmarked

While the other three VBDU types differ sharply from each other in one or two
dimensions, this last type exhibits no specific distinctiveness from the others, hav-
ing scores near 0.0 on all three dimensions.
3 From VBDU text-types to discourse structure
3.1 Functional interpretation of VBDU types
To this point, the VBDU-types have been interpreted primarily by considering

their multi-dimensional profiles (see Figure 8.3). To provide a more detailed func-
tional interpretation of each VBDU type, I carried out a survey of VBDUs from
each type, identifying the primary instructional purpose of each one.
A sample of fifty VBDUs (approx. 1% of the corpus) was selected for the sur-
vey. Each VBDU was analyzed to determine its primary instructional purpose.
Eight major categories are distinguished, based on previous research (e.g., Cazden,
1986; Marton & Tsui, 2004; Sinclair & Coulthard, 1975), and supplemented by a
detailed consideration of the tasks performed by instructors and students in the
current corpus:
1. Academic reporting: presentation of out-of-class research project or in-class
group work (students); presenting a case study or the results of past studies
(teacher)
2. Exposition: stating a series of facts to explain a concept; transmitting informa-
tion by restating facts from a book; providing a conceptual framework for a
topic; classifying measurements; direct quotation from written text
3. Demonstration: demonstrating a computer program (as the focus of the pres-
entation);
4. Expansion: topic elaboration; exemplifying through personal opinion, narra-
tive, or reformulating the content of written text; contrasting multiple aspects
of past events, discussing discipline specific matters, narrating past events in-
cluding personal and professional experience (facts and reflection), or through
solving a problem on-line;
5. Elicitation: brainstorming terms and concepts on a particular topic through
post reading reflections, eliciting definitions of a concept from the reading;
6. Management: three types of management are distinguished:
a. Class management with activities such as: finding lecture notes; finding
past exams on the network; procedures to disseminate exams; collecting
exams; procedures for exam evaluation; handing in assignments; dead-
lines and the sequence of assignments, activities, and exams; roles and
procedures to carry out a project.
b. Instructional management with activities such as: using contextual re-
sources (visuals, e.g., board, graph, chart, computer program); making
topic related references to other readings or materials; illustrating a con-
cept by using a computer.
c. Technical management with activities such as: getting ready for a presenta-
tion; changing sites during an IITV distance education class; putting a
microphone on.
7. Scaffolding: analyzing a math problem (step by step analysis on board); step by
step procedures (e.g., with a computer program or an in-class activity)
8. Summarizing: summarizing previously shared knowledge.
The distribution of the different types of VBDUs across instructional purposes is
summarized in Table 8.4 below.
Table 8.4 Instructional purposes and patterns of discourse unit types
Instructional purpose Discourse unit type
Contextual Informational Personal Unmarked Total

1. Academic reporting 3 (75%) 1 (25%) 4 (100%)
2. Exposition 1 (11%) 8 (89%) 9 (100%)
3. Demonstration 1 (100%) 1 (100%)
4. Expansion 2 (13%) 10 (67%) 3 (20%) 15 (100%)
5. Elicitation 3 (100%) 3 100%)
6. Management 14 (100%) 14 (100%)
7. Scaffolding 1 (33%) 2 (67%) 3 (100%)
8. Summarizing 1 (100%) 1 (100%)
Total 19 (38%) 11 (22%) 12 (24%) 8 (16%) 50 (100%)
Interesting patterns emerge from this sample data. Academic reporting and Ex-
position instructional functions are usually realized as Informational, monologic
VBDUs. Management activities are always realized as Contextual interactive VB-
DUs in this sample. Expansion activities (e.g., exemplifying, interpreting and
clarifying), are usually personal (Personalized framing VBDUs) rather than strict-
ly informational presentations.
Overall, three major patterns emerge from this survey. First, there is almost a
1-to-1 correspondence between Informational monologue VBDUs and academic
/ expository instructional purposes. Second, Personalized framing VBDUs are
usually used for expansion activities. And finally, Contextual interactive VBDUs
are usually used for management activities. However, it is interesting that these
Contextual interactive VBDUs seem to be the most versatile, serving expansion
and demonstration functions, and even exposition purposes in one case. In sum,
although the present survey is based on a small sample of VBDUs, it identifies
striking differences among the instructional purposes typically served by each
VBDU Type.
3.2 Texts as sequences of VBDU types2
One of the central goals of this book is to provide a comprehensive linguistic de-
scription of discourse units and the flow of discourse within texts. In Section 1.3
above, I showed how this goal can be addressed by tracking the multi-dimension-
al profile of VBDUs across all discourse units of a classroom teaching session. I
return to this goal here, using the VBDU-Types to track the flow of discourse in
these texts.
Table 8.5 below itemizes all VBDUs in the same classroom teaching session
that was discussed in Section 1.3. For each VBDU, this table lists both the VBDU-
type and the primary instructional purpose accomplished in the VBDU.
We saw in Section 1.3 that the first three VBDUs in this text were very differ-
ent in their multi-dimensional linguistic characteristics; in Table 8.5 we see that
these three VBDUs belong to three different VBDU-Types: Contextual interac-
tive, Unmarked, and Informational teacher monologue. The linguistic charac-
teristics of these VBDUs are discussed in detail as Text Excerpts 8.1, 8.2, and 8.3
(above). (VBDU 7 from this text was also discussed earlier, as Text Excerpt 8.7).
In the discussion below, I focus on three other VBDUs from this class session:
#26 28. These VBDUs come from three different types, and thus further illustrate
how the types exhibit different linguistic profiles, corresponding to differences in
communicative and/or instructional purposes.
The linguistic profile of VBDU #26 places it into the Personalized framing
type. The dominant linguistic features in this VBDU are positive Dimension 2
features, such as that-deletion, mental, factual, and likelihood verbs (e.g., find,
think, know), third person personal pronouns, and past tense. This VBDU type
also has a moderately positive Dimension 1 score (contextual orientation).
2. It is important to emphasize that although short VBDUs (less than 100 words) are not in-
cluded in the linguistic analysis, they have important functions in a stretch of discourse. Due to
space limitations, however, the present study does not attempt to provide a detailed account of
either their linguistic characteristics or their functional purposes. As a result, the study can offer
only a restricted description of the overall organization of class sessions. In general though, class
sessions include multiple longer VBDUs, making it possible to provide preliminary compari-
sons of the overall discourse organization of class sessions.
Table 8.5 Instructional purposes of the discourse units in a business management class
VBDU Cluster VBDU Type Instructional Purpose

Number Number
1 3 Contextual interactive Management (class management)

2 4 Unmarked Elicitation
3 2 Informational teacher monologue Exposition
5 Short*
7 3 Contextual interactive Expansion (exemplification)
8 Short
9 Short
10 4 Unmarked Expansion (explanation)
11 Short
12 1 Personalized framing Expansion (elaboration)
13 Short
14 Short
15 1 Personalized framing Expansion (elaboration)
19 2 Informational teacher monologue Expansion (exemplification)
20 4 Unmarked Expansion (exemplification)
21 Short
22 Short
23 2 Informational teacher monologue Exposition (description)
25 3 Contextual interactive Expansion (exemplification)
26 1 Personalized framing Expansion (interpreting,
clarifying text)
27 3 Contextual interactive Summarizing (reformulating text)
28 2 Informational teacher monologue Direct quotation
29 1 Personalized framing Expansion (interpreting text)
30 Short
31 4 Contextual interactive Summarizing
32 Short
* Short indicates that the VBDU was 100 words and so was not included in this analysis.
Text Excerpt 8.8

VBDU 26 Personalized framing (summary and clarification)
(Positive Dimension 1 features are marked by bold italics. Negative Dimen-
sion 1 features are in bold. Positive Dimension 2 features are marked by CAP-
ITALIZED ITALICS.)
Teacher: got oh we need to find out about the hot stove rules uh and
therere three or four really important points in the hot stove rule.
and who was the name of the person who wrote this?
Student: Douglas McGreagor.
Teacher: Douglas.
Student: McGreagor.
Teacher: Mcgreagor. Ok. um summarize those four or five points out of DMs
hot stove rule.
Student: well first of all what hes talking about is when a manager has to
manage conflict at work. he has to act like uh or hes going from
the frame of reference if youre the mother or father and your child
or whatever touches the stove and how that child reacts, thats kind
of the frame of reference hes going from. so youre saying the man-
ager has to be swift uh you have to quickly establish rules and poli-
cies so you wont have difficulties and harassments in the future.
Teacher: now why is being swift and stepping in immediately important um
rather than sitting there and letting it just go on?
Student: because if you let it fester up YOU KNOW people are going to get
that much more upset with each other and YOU KNOW somebody
else might even get involved. its just better to nip it in bud while
you have the opportunity.
Teacher: Ok nip (it) in the bud is probably a good term. now what was the
second point that D.M. made?
Student: its relatively intense with the first offense and what hes saying there
is like the first time it happens you want to make sure you jump on it
with both feet so that both the parties dont THINK its alright to do?
and they need to YOU KNOW cease that kind of activity and you
dont want to give the same people a second chance to do the same
thing again, same conflict arise same disagreements
Teacher: Ok. what was the third point?
In VBDU #26, the teacher prompts students to summarize and to explain multiple
aspects of the concept under discussion. The primary instructional purpose fo-
cuses on interpreting and clarifying a written piece of text, with students express-
ing their own personal interpretations. As a result, we see frequent use of 3rd per-
son pronouns, referring to the author of the source text, and frequent use of the
discourse marker you know framing the students interpretation. Positive Dimen-
sion 1 features are also relatively common in this VBDU, including 2nd person
pronouns, conditional clauses, and activity verbs. In sum, VBDU #26 displays the
linguistic features of a Personalized framing VBDU while reflecting the most typi-
cal instructional purpose of this VBDU type: Expansion (interpreting and clarify-
ing; see Table 8.5).
In the next segment, VBDU #27, is from the contextual interactive type; the
dominant lexico-grammatical features are first and second person pronouns, con-
tractions, and modals positive Dimension 1 features along with interactive
features from positive Dimension 3 such as, relatively short turns and discourse
markers such as well and ok.
Text Excerpt 8.9
VBDU #27 Contextual interactive (elaboration)
ITALIZED ITALICS. Positive Dimension 3 features are marked by underlined
italics.)
Student: its impersonal.just YOU KNOW (you) cant take this personally
because it s just my job. I MEAN I think THEIR point is that the
manager should let the employees know that this is nothing per-
sonal no matter what HE or SHE decides.
Teacher: Ok. now quite often we as managers we do make that mistake. we
personalize things giving the impression that I dont like you uh
youre a screwball uh gosh youre dumb. you want to take the per-
son out of it saying that is a non-acceptable act nothing about the
person, its just a non-acceptable act and you cant do that. Ok.
whats the next point?
Student: well the next one in my opinion is just like its personal.it says it
emphasizes behavior not the person? which in my opinion is the
same thing as.
Teacher: dont personalize (it). Ok. and the last one?
This VBDU begins by summarizing the previous discussion. The student summa-
rizes the ideas in the text and makes direct references to the authors of the text by
using third person pronouns such as, their, he, she. Responding to this, the
teacher repeats the content of the students summary but uses 1st and 2nd person
pronouns (you and we) to include the actual classroom participants, bringing
the hypothetical situation back to the physical context. The instructional purpose
of this segment is Summarizing (see Table 8.5), but the summary involves a direct
contextualized interaction with students.
Finally, VBDU #28 comprises a quote from the written text and thus repre-
sents the Informational monologue type. The dominant linguistic characteristics
are negative Dimension 1 features (e.g., nouns, attributive adjectives, past tense
features, prepositions) and negative Dimension 3 (long turns).
Text Excerpt 8.10
VBDU # 28 Informational monologue (written text that is read aloud)
ITALIZED ITALICS.)
Student: it is consistent. uh HE says in HIS example the hot stove will burn
you every time therefore the punishment should be consistent and
without favoritism and
Teacher: one of the things THEY stress in parenting is to be consistent and
particularly with parents um some parents are inconsistent be-
tween siblings. uh fathers are notorious for letting THEIR little
darling girls get away with what THEY swat the boys about. and
mamas have a tendency to let the boys get away with things a little
stricter on the girls. thats not in all cases but its certainly possible.
Ok. let us look at the case. the case was about what B? Ok. how
about somebody summarizing the reading of the case for us. the
man with the mic over there can say it better than anybody.
Student: problems at the hospital.Smith County is a suburban area near
a major midwestern city. The county has experienced such a tre-
mendous rate of growth during the past decade that local govern-
ments have had difficulty providing adequate service to the citi-
zens. Smith County Hospital has a reputation for being a first
class facility but it is inadequate to meet local needs. During cer-
tain periods of the year the occupancy rates exceed the licensed
capacity. There is no doubt in anyones mind that the hospital must
be expanded immediately. At a recent meeting of the hospital au-
thority, the hospital administrator K. A. presented the group with
a proposal to accept the architectural plans of the firm of W and G
G. the plan calls for a hundred bed additional addition adjacent to
the existing structure. K. announced that after reviewing several
alternative plans, SHE believed the W and G plans would provide
the most benefit for the expenditure. At this point R.R.L. the
board chairperson began questioning the plan. R. made it clear
HE would not go along with the W and G plan. HE stated that the
board should look for other firms to serve as the architects for the
project. The ensuing argument became somewhat heated and a
ten minute recess was called to allow those attending to get coffee
as well as allow tempers to calm down. K was talking to J. R. an-
other member of the hospital authority board in the hall and said
R. seems to fight me on every project. R. who was talking to other
members of the board was saying I know that the W and G plan is
good but I just cant stand for K. to act like its HER plan. I wish
SHE would leave so we could get a good administrator from the
community who whom we can identify with.
Teacher: the text asks the question Is R.s reaction uncommon? Explain.
Somebody want to comment on that?...
Student: I would say its not with a big change like that theres usually people
are change.
Teacher: THEY usually oppose change did you say? Ok. anybody else got a
comment?
What is most noticeable in VBDU 28 is the large number of linguistic features

from the negative side of Dimension 1, more characteristic of written text than
typical speaking. This VBDU also includes some personal framing, by both teach-
er and student, as an expansion of the written text; but by far the dominant style of
this VBDU is determined by the typical linguistic characteristics of informational
written prose.
The series of three extracts above illustrates how differences in the VBDUs
linguistic features relate to the differences in context and primary instructional
purposes. Taken together, an analysis of these shifts provides a linguistic perspec-
tive on the discourse structure of a classroom teaching session.
4 Summary and conclusion
The primary goal of this chapter was to demonstrate how the discourse structure
of an individual class session can be described based on the intra-textual linguistic
changes identified through variation in VBDUs. This chapter examined the dis-
course patterns of an upper division Business Management class session, relying
on patterns of linguistic variation identified from a corpus of university class ses-
sions. The VBDU Types were analyzed in terms of their typical instructional pur-
poses. For example, Informational monologue VBDUs typically relate to instruc-

tional purposes such as, Academic reporting and Exposition; Contextual
interactive VBDUs relate to class management functions; and Personalized fram-
ing VBDUs relate to Expansion as an instructional purpose, including activities
such as clarification, or text interpretation.
Each classroom teaching session is uniquely organized beyond its macro-pat-
terns (Young, 1994). However, describing the structure of classroom discourse be-
yond its general macro-structure (e.g., openings and closings) has proven to be a
challenging task for researchers. The present study contributes to this area of re-
search by applying corpus-based methods to the analysis of classroom teaching
discourse. It takes the co-occurring patterns of a large number of linguistic items
in a large number of units as the basis for the analysis. The study further describes
the relationships between language variation and corresponding communicative
functions and instructional purposes in class sessions. As a result, variation within
class sessions can be described, allowing the analyses to go beyond the description
of simple macro-structure (openings and closing).
chapter 9
Conclusion
Comparing the analytical approaches
1 Overview
In Chapter 1, we noted that there have been three major domains of inquiry asso-
ciated with discourse analysis: 1) the study of language use; 2) the study of lin-
guistic structure beyond the sentence; and 3) the study of social practices and
ideological assumptions that are associated with language and/or communication
(see also Schiffrin et al., 2001).
Many recent discourse studies focus on the third perspective. For the most
part, these studies do not deal with linguistic analysis, and often they do not even
provide examples of specific texts. Rather, studies of this type tend to discuss com-
munication practices and cultural norms and relationships, apart from their lin-
guistic instantiation in particular texts. As Scollon and Scollon (2001, p. 538) put
it, In this line of development the primary focus is on society and social practice,
with an attenuated or even absent interest in texts or discourse in the narrower
linguistic sense.
At the other extreme, the field of corpus linguistics has become almost syn-
onymous with the study of discourse as language use (#1 above). That is, the major
strength of corpus-based analysis is to document empirically the patterns of lan-
guage use in some collection of texts. As a result, most recent discourse studies of
language use are based on analysis of a corpus, and conversely, most studies in
corpus linguistics describe how lexical/grammatical features are used in discourse.
For example, well over half of the articles published in recent issues of the Interna-
tional Journal of Corpus Linguistics state that they are studying characteristics of
discourse. Recent corpus-based books have also been quite explicit about their
discourse basis, with titles like Discourse in the Professions, Strategies in Academic
Discourse, Textual Patterns, Using Corpora in Discourse Analysis, and Exploring
Discourse through Corpora. In almost all cases, these corpus investigations are
studies of discourse in that they document the patterns of language use in a repre-
sentative collection of natural texts. Such studies focus on the traditional concerns
of corpus linguistics, such as patterns of collocation, the use of particular gram-
matical features, patterns of grammar in association with lexis, the typical linguis-
tic characteristics of particular genres/registers, etc.
However, the second analytical perspective on discourse listed above the
study of discourse structure, text organization, and linguistic structure beyond
the sentence has been neglected in recent years, whether undertaken from qual-
itative or corpus-based perspectives. This trend represents a dramatic departure
from earlier research on discourse, where the description of textual structure and
organization was a central concern. For example, the classic textbook on discourse
analysis by Brown and Yule (1983) focused almost entirely on linguistic structure
beyond the sentence, with sections on discourse topics, topic boundary markers,
thematic structure, information structure, cohesion, coherence, frames, scripts,
schemata, etc. Every chapter contained detailed discussion and numerous exam-
ples of how these aspects of discourse structure are realized linguistically. While
such research was popular in the 1970s, 1980s, and early 1990s (e.g., Brown &
Yule, 1983; Grimes, 1975; Mann & Thompson, 1992), in recent years it has become
much less common to approach discourse as the study of linguistic structure or
organization above the sentence-level. Thus, for example, of the 41 chapters in the
recent Handbook of Discourse Analysis (Schiffrin, Tannen, and Hamilton (2001),
only seven chapters by Martin, Ward and Birner, Polanyi, Dubois and Sankoff,
Stubbs, Herring, and Chafe include any discussion or examples of how language
features are used to organize discourse above the sentence level.
Given this general decline in the study of discourse structure and textual or-
ganization, it is not surprising that there have been almost no previous corpus-
based studies of discourse organization. Of course, the methodological challenges
of such investigations have also been a major deterrent. The study of discourse
organization requires detailed analysis of each individual text, to identify their
text-internal structures, and to describe their pragmatic functions. But as noted
above, corpus-based research has instead focused on linguistic patterns that exist
across hundreds or thousands of texts, necessitating the use of automatic analyti-
cal techniques (e.g., concordancing software to analyze the collocations of a target
word). Such analyses usually do not even acknowledge the existence of individual
texts; rather, the goal of the analysis is to produce frequency counts for the entire
corpus, rather than analyses of each text. Thus, studies offer findings like Pattern
A occurs with a certain frequency in this corpus, with no indication of how this
pattern is distributed within or across individual texts.1
1. In a few previous corpus studies, texts are the units of analysis and thus have a more central
status. For example, in multi-dimensional studies, the distribution of linguistic features is ana-
lyzed separately in each text, providing the basis for the quantitative investigation of how lin-
guistic features tend to co-occur in the texts of a corpus.
Chapter 9. Conclusion
In contrast, individual texts are central to the goals of the present book. That
is, to truly integrate the study of discourse organization with corpus-based analy-
sis, we needed to develop an approach that includes a detailed analysis of each in-
dividual text, in terms that can be generalized across all texts of a corpus.
In Chapter 1, we described how these goals can be achieved through two main
methodological approaches: top-down and bottom-up. Top-down is the more tra-
ditional approach i.e. applying the discourse-analytic techniques previously de-
veloped for the analysis of individual texts to an entire corpus. In fact, the top-
down approach is not necessarily corpus-based. Rather, through the intensive
efforts of dedicated researchers, these methods can be applied to each of the mul-
tiple texts of a corpus. Once that coding is completed, it is possible to apply auto-
matic techniques to describe the typical patterns of discourse organization that
hold for the entire corpus.
The bottom-up approach, on the other hand, is fundamentally corpus-based.
That is, bottom-up approaches are automated so that they can easily be applied to a
corpus of any size. Once the techniques are developed, there are no concerns about
the effort required to analyze a large corpus of texts. Of course, the major challenge
with a bottom-up approach is identifying meaningful discourse structures: produc-
ing a kind of structural analysis that is actually useful to discourse analysts.
Three specific approaches are illustrated in the present book: two top-down
approaches move analysis and appeals analysis and one bottom-up approach,
based on vocabulary-based discourse units. These specific approaches clearly il-
lustrate the more general characteristics described above: move analysis and ap-
peals analysis are both extremely labor-intensive, and both approaches have been
previously used for more traditional discourse analyses of selected texts. In con-
trast, the analysis of vocabulary-based discourse units is automated and can easily
be applied to a corpus of any size; this approach is specifically designed for corpus
analysis and has not been used previously for traditional discourse analysis.
As we describe in Chapter 1, there are other important differences between top-
down and bottom-up approaches. Perhaps the most important of these is the pri-
mary basis of the analysis: functional-qualitative vs. linguistic-quantitative charac-
teristics. Functional analysis is primary in top-down approaches; functional
distinctions are determined on a qualitative basis, to determine the set of relevant
discourse types and to identify specific discourse units within texts. In contrast, lin-
guistic analysis is primary in bottom-up approaches; a wide range of linguistic dis-
tributional patterns are analyzed quantitatively, again being used to determine the
set of relevant discourse types and to identify specific discourse units within texts.
A related difference is the order of analysis. In top-down approaches, the re-
searcher begins with functional-qualitative methods to develop an analytical
framework that describes the types of discourse units in the target genre. That is,
we need to fully describe the (functional) discourse types in this genre before be-
ginning the empirical analysis (segmenting the texts in the corpus into discourse
units). In these approaches, linguistic-quantitative analysis comes at an even later
step, to facilitate the interpretation of the discourse types.
In contrast, bottom-up approaches begin with a kind of linguistic-quantitative
analysis: segmenting the texts into discourse units on the basis of vocabulary dis-
tributional patterns. Comprehensive linguistic-quantitative (lexical and grammat-
ical) analysis is then undertaken as the basis for identifying the types of discourse
units). Functional-qualitative analysis is a later step, to facilitate the interpretation
of the discourse types.
The two approaches have very different strengths. Top-down approaches apply
well-established research methods that are quite familiar to the wider professional
community. Further, because the analyses have a primary functional-qualitative
basis, they are directly interpretable by discourse analysts. In contrast, bottom-up
analyses have a complex quantitative-linguistic basis, analyzing numerous linguis-
tic distributional patterns to determine the discourse units within texts as well as
the general discourse types. The strengths of this approach are that it is replicable,
can easily be applied to large corpora, and produces generalizable results. Its major
weakness is that the discourse description is relatively complex, based on multi-
variate quantitative-linguistic distributional patterns, and thus does not necessar-
ily represent the kinds of discourse constructs that can be uncovered though the
detailed discourse analysis of individual texts. But this weakness might also be
considered a strength: by approaching discourse from a radically different per-
spective, we have the possibility of identifying textual patterns that would other-
wise go unnoticed by analysts.
Given these major differences in methods, and their complementary strengths,
we would predict that the two approaches will provide different insights into the
discourse organizational patterns of a genre. At the same time, it is reasonable to
expect that the inherent structure of a genre would be reflected in analyses under-
taken from both perspectives. The following section explores these relationships
by comparing the top-down analysis of biochemistry research articles (Chapter 4)
to the bottom-up analysis of biology research articles (Chapter 7).
2 Comparing the top-down and bottom-up descriptions

of biology research articles.
Two separate studies in this book have focused on the discourse organization of
biology research articles. The two studies were carried out independently, so it is
not possible to directly compare the analyses of the exact same texts. Further, the
studies are not exactly comparable: Chapter 4 focuses on a specific sub-discipline

biochemistry and is based on analysis of research articles from five scientific
journals; Chapter 7 focuses on the more general discipline of biology including
sub-disciplines like entomology, human genetics, and microbiology and is based
on analysis of research articles from ten different scientific journals. However, the
similarities between these studies are stronger than the differences both studies
investigated published academic research articles in the general discipline of biol-
ogy. The major difference between the studies is their methodological approach:
top-down move analysis in Chapter 4 versus bottom-up VBDU analysis in Chapter
7. Thus it is instructive to compare the nature of the findings from the two studies.
The discourse descriptions resulting from these two studies can be compared
with respect to four different considerations:
a) the nature of the discourse units (moves vs VBDUs) in biology research arti-
cles;
b) the dimensions of linguistic variation among discourse units in biology re-
search articles;
c) the functional and linguistic characteristics of the discourse types (move types
vs VBDU types) in biology research articles;
d) description of the typical discourse structures of biology research articles.
2.1 Discourse units in biology research articles
The first point of comparison is the nature of the discourse units that are investi-
gated in these two studies. We have already discussed the differing methodological
bases of moves and Vocabulary-Based-Discourse-Units (VBDUs): that is, each
move expresses a distinct communicative function, while each VBDU uses a dis-
tinct set of words. This methodological difference results in discourse units that
are somewhat different in nature. First of all, moves tend to be considerably short-
er than VBDUs. The shortest moves can be 4 or 5 words long, while the longest
moves contain 400500 words. On average, moves are about 56 words long in the
biochemistry research articles (see Table 4.3). In contrast, the shortest VBDUs are
around 70 words, while the longest VBDUs are around 1,000 words. On average,
VBDUs are about 211 words long in the biology research articles (see Table 7.1).2
Moves and VBDUs also differ in terms of their textual coverage: Moves are not
necessarily continuous stretches of text. Rather, a single move includes all portions
2. In both studies, the shortest moves were excluded for the quantitative-linguistic analyses,
but this restriction also reflected the different nature of these discourse units: moves shorter
than 25 words were excluded in the Chapter 4 study, while VBDUs shorter than 100 words were
excluded in the Chapter 7 study.
of a text related to a single communicative function, regardless of whether those

text segments are contiguous or not. In contrast, VBDUs represent continuous
stretches of text. This difference has practical consequences for the description of
discourse organization. It is possible to describe a text as a sequence of VBDUs,
because those text segments in fact occur in sequential order. In contrast, moves
do not necessarily correspond to the actual sequence of words in a text. To some
extent, it is possible to describe a typical sequence of moves (as in Chapter 3, Sec-
tion 3.6), but this task is complicated by the fact that moves can be interspersed
and overlapping in a text.
In sum, moves and VBDUs are similar in that they both provide ways to seg-
ment a text into smaller discourse units, and thus analyze the internal discourse
organization of a text. They differ, however, in their methodological bases, and in
the actual extent of text included in the unit.
2.2 The dimensions of linguistic variation in biology research articles
In both Chapter 4 and Chapter 7, multi-dimensional (MD) analyses were carried

out. As noted in earlier chapters, the goal of this kind of analysis has usually been
to identify the underlying parameters of linguistic variation among texts within
the target discourse domain, which usually includes several different registers (el-
ementary school registers; university registers).
In contrast, the MD analyses in the present book describe the patterns of vari-
ation within a single written genre. If these analyses had been based on complete
research articles, we would expect two general results: 1) that the two MD analyses
would be nearly identical, and 2) that only minor patterns of variation would be
uncovered. The first anticipated result is the easiest to understand: Given that the
two corpora were constructed to represent highly similar discourse domains, we
would predict that the MD analysis would uncover similar dimensions of varia-
tion. The second predicted result requires a little more explanation. Like all cor-
relational statistical techniques, MD analysis requires variation to achieve mean-
ingful results. That is, if linguistic features do not vary in their rate of occurrence
across the texts of a corpus, this technique will not succeed in identifying patterns
of co-variance (or linguistic co-occurrence). When we consider a corpus of scien-
tific research articles from a single discipline where each text is a complete re-
search article we find extremely little linguistic variation across texts. Thus, if the
MD analyses in the present book had been based on complete research articles, it
is unlikely that we would have discovered interpretable linguistic patterns.
However, there is extensive linguistic variation within scientific research arti-
cles. That is, the sub-sections and sub-texts within research articles are quite differ-
ent in their communicative purposes, and those discourse units therefore differ
considerably in their typical linguistic characteristics. One general goal in this

book has been to first segment texts into smaller discourse units that are linguisti-
cally well-defined, so that we can then track patterns of linguistic variation within
texts. Chapters 4 and 7 adopt different approaches to capturing that text-internal
variation, segmenting research articles in different ways into fundamentally differ-
ent kinds of sub-texts (moves versus VBDUs). And the MD analyses in these two
chapters were then based on those different kinds of sub-texts.
From a statistical point of view, this procedure creates a corpus with linguistic
variation. That is, there is extremely little variation among texts in a corpus of re-
search articles. But if those same research articles are segmented into moves or VB-
DUs, we will discover extensive linguistic differences in a corpus composed of those
sub-texts. The question that we take up in the present section is whether we find
similar patterns of linguistic variation in a corpus of moves and a corpus of VBDUs.
Although these two MD analyses are based on different kinds of sub-texts,
they are similar in two crucial respects: 1) the units of analysis cover the full extent
of each text, and thus taken together, they cover the full range of linguistic variabil-
ity found in these research articles; and 2) each of the two multi-dimensional anal-
yses is based on a relatively comprehensive set of linguistic features. As a result, the
two MD analyses identify similar parameters of variation. The MD analysis of bio-
chemistry moves identified 7 factors, while the biology VBDU study identified
only 4 factors. However, there are close correspondences in both studies for those
four factors. Table 9.1 lists the correspondences between the two analyses.
Table 9.1 shows that the four factors identified in the MD analysis of biology
VBDUs all have highly similar corresponding dimensions in the MD analysis of
biochemistry moves, based on the sets of co-occurring linguistic features, as well
as the functional interpretations assigned to each dimension. Some differences
between the two analyses are due to the fact that a larger set of features were in-
cluded in the biology VBDU study (especially semantic classes of nouns and verbs,
such as abstract nouns, process nouns, activity verbs, communication verbs).
However, these additional features co-occur with the core features included in
both analyses, rather than defining additional dimensions in the VBDU analysis.
As a result, the four dimensions identified in the VBDU analysis all have close
counterparts in the move analysis.
The positive features for Dimension 1 in the move analysis are long words, at-
tributive adjectives, and nouns nearly identical to the positive features for Dimen-
sion 4 in the VBDU analysis. (The major difference is that Dimension 4 in the VBDU
analysis includes two specific semantic classes of nouns: abstract nouns and process
nouns.) Both analyses interpret this dimension as relating to (abstract) conceptual
discussion (as opposed to concrete reference in the case of the move analysis).
Table 9.1 Comparison of the corresponding dimensions in the MD analysis of biochemistry moves (Ch 4)
and the MD analysis of biology VBDUs (Ch 7)
Factor analysis of biochemistry moves Factor analysis of biology VBDUs

(See Table 4.4) (See Table 7.3)
Factor number Linguistic features Factor number Linguistic features

& Interpretation & Interpretation
1: long words 4: nominalizations

Conceptual attributive adjectives Abstract / long words
versus nouns Theoretical abstract nouns
Concrete Reference versus Discussion of process nouns
numerals Concepts attributive adjectives
acronyms and jargon
2: passive voice verbs 3: passive voice verbs

Concrete past tense Procedural activity verbs
Action coordinators Description of past tense
versus versus Actions / Events progressive aspect
Abstract definite articles time adverbials
Discussion nominalizations
prepositions

Factor analysis of biochemistry moves Factor analysis of biology VBDUs
(See Table 4.4) (See Table 7.3)
3: extraposed It 1: predicative adjectives

Evaluative adjective+that-clause Evaluation of adjective+to-clause
Stance predicative adjectives Possible adjective+that-clause
Explanations modal verbs
linking adverbials
causative / conditional subordination
5: present tense 2: communication verbs

Attributed references Current State present tense
Knowledge type/token ratio of Knowledge perfect aspect
versus versus versus epistemic verb + that-clause
Current Study nouns Past Events versus
past tense and Actions past tense
Dimension 2 in the move analysis corresponds to Dimension 3 in the VBDU anal-

ysis. The main features shared by these dimensions are passive voice verbs and past
tense. The VBDU analysis additionally has activity verbs and time adverbials load-
ing on this dimension. In both analyses, this dimension was interpreted as a (pro-
cedural) description of actions (or events).
Dimension 3 in the move analysis corresponds to Dimension 1 in the VBDU
analysis. Both dimensions included predicative adjectives, especially controlling a
complement clause; this dimension in the VBDU analysis additionally included
modal verbs and causative/conditional adverbial subordination. In both analyses,
this dimension was interpreted as evaluative stance (comparing the strengths of
competing explanations).
Finally, Dimension 5 in the move analysis corresponds to Dimension 2 in the
VBDU analysis. The basic opposition represented in both of these dimensions is
present tense versus past tense. In Dimension 5 of the move analysis, present tense
co-occurs with frequent citations to previous research and a high type/token ratio.
The VBDU analysis did not include citations as a linguistic feature, but it did in-
clude the semantic class of communication verbs, which loads strongly with
present tense (and perfect aspect) on Dimension 2. The interpretive labels here are
somewhat different, but the prose descriptions in the two chapters show that sim-
ilar underlying functions are associated with the two dimensions. The positive Di-
mension 5 features in the move analysis are interpreted as attributed knowledge,
while the positive Dimension 2 features in the VBDU analysis are interpreted as
current state of knowledge. In both cases, these features are interpreted as present-
ing the findings of previous research, describing what we already know as the
backdrop to the present study. Present tense is used for these descriptions to em-
phasize that this is the current state of our knowledge, even though it is based on
previous studies. In contrast, the negative pole of both dimensions uses past tense
verbs to describe the actual actions and events of the present study. For Dimension
5 of the move analysis, this pole is labeled current study in opposition to the cur-
rent state of knowledge (i.e., summarizing previous studies) in the label of VBDU
Dimension 2.
Taken together, this comparison shows a strong set of correspondences be-
tween the two independent MD analyses, based on different corpora, targeting
slightly different genres, analyzed with respect to slightly different sets of linguistic
features, and each interpreted on its own terms by different researchers. Most im-
portantly for our purposes here, the two MD analyses are based on different kinds
of discourse units: In Chapter 4, research articles are segmented into moves, while
in Chapter 7, research articles are segmented into VBDUs. As noted above, the
MD approach is statistically feasible only after segmenting the corpus of research
articles into smaller linguistic sub-texts. It is reasonable to expect that we might
uncover different parameters of variation when comparing the linguistic charac-

teristics of moves to the linguistic characteristics of VBDUs. However, the com-
parison here suggests that both approaches to text segmentation result in discourse
units that capture the range of linguistic variability in this discourse domain, and
as a result, the four major dimensions identified in the VBDU MD analysis have
close counterparts in the move MD analysis.3
2.3 The functional and linguistic characteristics of the discourse types (move
types vs VBDU types) in biology research articles
The two approaches to discourse structure in the present book have fundamentally
different bases for determining the discourse types in a genre: functional criteria in
the case of move analysis versus linguistic criteria in the case of VBDU analysis.
Given this difference, it might be supposed that the discourse types identified in the
two analyses would bear no resemblance to one another. There are in fact major
differences in the two sets of discourse types. Fifteen different move types were
identified in the analysis of biochemistry research articles, and these were further
broken down into 29 different steps in addition to the 7 move types that do not
have steps: a total of 36 different discourse types, each with a distinct communica-
tive function. In contrast, only 6 different discourse types were identified in the
VBDU study. Thus, it is clear that the move analysis produces a much more detailed
description of the different kinds of discourse types than the VBDU analysis.
The functional basis of move analysis is the main factor that permits this level
of detail: The researcher asks at each point in a text what specific communicative
goals the author is trying to achieve. There is no a priori limit on the number of
goals in a text or genre, allowing for a very fine-grained description. In contrast,
VBDU types must be linguistically different. Further, the specific approach illus-
trated in Chapter 7 is based on only 4 linguistic predictors: the 4 dimensions of
variation. VBDU types are further different from move types in that they are in-
tended to represent only the major linguistic groupings of discourse units. That is,
the dimensions of variation represent the major parameters of linguistic variation
in this discourse domain, and the VBDU types correspondingly represent only the
major clusters of discourse units as defined by those linguistic parameters. Thus,
whereas move types can take into account fine-grained distinctions of communi-
cative purpose, the VBDU types are much more general, in this case accounting
for only six major groupings of linguistically distinct discourse units.
3. The major difference between the two analyses is due to the fact that the factor analysis in
the move study extracted seven dimensions while the VBDU factor analysis extracted only four
dimensions.
Despite these differences, there are important similarities in the discourse

types identified by the two approaches. One way to describe those similarities is to
compare the functional interpretations of the VBDU types (see Sections 68 of
Chapter 7) to the move types identified in Chapter 4 (see Table 4.1). Table 9.2 lists
some of the most obvious correspondences, where the functional basis of specific
move types seems to clearly correspond to the functional interpretation of a VBDU
type. Table 9.3 then lists additional correspondences that are more tentative, usu-
ally because the functional interpretation of the VBDU type is more general.
As summarized in Table 9.2, several discourse types identified in the two anal-
yses seem to have close correspondences to one another. In general, a single VBDU
type corresponds to several different move types (and often to specific steps within
a particular move type). This pattern makes sense given that VBDU types are more
general than move types (see discussion above). In addition, a single VBDU type
corresponds to moves/steps from across the extent of a research article. For exam-
ple, VBDU Type 2 (Procedural description of past actions) seems to correspond to
moves/steps that occur in all four major sections of research articles: the Introduc-
tion (Move 3-Step 2), Methods (Move 5-Steps 1, 2, 3; and Move 7), Results (Move
8-Step4), and Discussion (Move 13-Step 1). Similarly, VBDU Type 5 (Presentation
of the current state of knowledge) seems to correspond to moves/steps that occur in
both the Introduction (Moves 1, 2) and Discussion (Moves 12, 13).
In contrast, the three VBDU Types listed in Table 9.3 have more general func-
tional interpretations, making it more difficult to identify specific corresponding
move types. VBDU Type 3 is interpreted as Report of past events, but it is not en-
tirely clear what moves/steps report past events (apart from the procedural moves
that correspond to VBDU Type 2). One likely candidate here is Move Type 4
Step 2 (Detailing the source of materials), which seems to often provide an account
of how materials were obtained (a kind of past tense narration). Similarly, It is not
clear which move types correspond to VBDU Types 4 (Abstract elaborated discus-
sion) and VBDU Type 6 (Current abstract/theoretical discussion). It seems likely
that many of the move types in these articles might present abstract discussion of
this type (see the list in Table 9.3), but the functional descriptions of move types in
Chapter 4 do not in general distinguish between abstract / theoretical discussion
versus more concrete description. (Interestingly, the MD analysis of moves does
reflect the distinction between abstract / conceptual discussion versus more con-
crete description.)
Table 9.2 Strong correspondences between the functional interpretations of

VBDU Types and Move Types
VBDU Type 1: Current evaluation of implications and explanations

Move Type 11: Commenting on results
Move Type 13: Consolidating results

VBDU Type 2: Procedural description of past actions and events

Move Type 3: Introducing the present study
Step 2: Describing procedures
Move Type 5: Describing experimental procedures

Step 1: Documenting established procedures
Step 2: Detailing procedures
Step 3: Providing the background of the procedures
Move Type 7: Describing statistical procedures
Move Type 8: Restating methodological issues

Step 4: Listing procedures or methodological techniques

Step 1: Restating methodology (purposes, research questions, hypotheses, and procedures)
VBDU Type 5: Presentation of the current state of knowledge

Move Type 1: Establishing a topic
Move Type 2: Preparing for the present study: Indicating a gap/raising a question
Move Type 12: Contextualizing the study

Step 1: Describing established knowledge
Step 2: Generalizing, claiming, deducing previous knowledge

Table 9.3 Tentative correspondences between the functional interpretations of VBDU

Types and Move Types
VBDU Type 3: Report of past events

Move Type 1: Establishing a topic
[if this move reports what previous studies accomplished]
Move Type 4: Describing materials

Step 2: Detailing the source of the materials
[if this Step reports where/how materials were obtained]
Move Type 10: Announcing results

[if this move reports what the authors found (in the past tense)]
VBDU Type 4: Abstract elaborated discussion (not evaluative and not procedural)
VBDU Type 6: Current abstract/theoretical discussion
Move Type 3: Introducing the present study
Step 1: Stating purpose(s)
Step 3: Presenting findings
Move Type 8: Restating methodological issues

Step 1: Describing aims and purposes
Step 2: Stating research questions
Step 3: Making hypotheses
Move Type 9: Justifying methodological issues
Move Type 11: Commenting on results

Step 5: Summarizing

Step 6: Exemplifying
Move Type 14: Stating limitations of the study
In sum, there are surprising correspondences between the move types identified in
Chapter 4 and the VBDU types identified in Chapter 7, despite the major differ-
ences in the research approaches used to identify the two types of discourse units.
In the present section, we matched discourse types based on their functional inter-
pretations in the two approaches. This matching leads to certain predictions, al-
lowing us to test the interpretive bases of the two approaches. For example, we
know that Move Type 5 (Describing experimental procedures) always occurs in

Methods sections. If we are correct in claiming that VBDU Type 2 (Procedural
description of past actions) corresponds to Move Type 5, then the analysis in Chap-
ter 7 should have shown that this VBDU Type also commonly occurs in Methods
sections. We consider predictions of this type in the next section.
2.4 Description of the typical discourse organization of biology research articles
In the last section, we compared the functional bases of discourse types in the
move analysis of biochemistry research articles to those identified in the VBDU
analysis of biology articles. We identified several correspondences, where dis-
course types were posited serving similar functions in the two analyses.
In the present section, we shift our attention to the question of how these dis-
course types are used to construct research articles. In the case of move analysis,
this question is relatively straightforward: each section of an article is composed of
a unique set of moves, and these are described in the order in which they typically
occur. Thus, for example, Move Type 1 (Establishing a topic) usually occurs before
Move Type 2 (Preparing for the present study), which in turn usually occurs before
Move Type 3 (Introducing the present study); all three of these move types occur
only in Introductions. Section 5 in Chapter 4 notes that there can be variation in
the order of move types within an article section. In addition, moves are not neces-
sarily continuous stretches of text. However, the order of move types given in Ta-
ble 4.1 is described as the most common sequence within these articles.
In contrast, VBDU types are not identified or defined in any way by reference
to their position within the text. Thus, VBDUs can potentially occur in any order
within a text, and a given VBDU Type can potentially occur in any section of a
research article.
Despite these differences, it is informative to compare the results from move
analysis and VBDU analysis regarding the typical sequence of discourse types
within texts. For move analysis, this is based directly on Table 4.1 (in Chapter 4),
which lists the move types within each article section, in their most common or-
der. For VBDU analysis, this is based on Figure 7.4 (in Chapter 7), which shows
the most common VBDU Type in each section, as well as Figures 7.5 7.7, which
compare the preferred VBDU Types at the beginning and end of each section.
Table 9.4 summarizes the most common sequential organization of these re-
search articles from the two analytical perspectives. This comparison reinforces
and further elucidates many of the functional interpretations in Chapters 4 and 7.
For example, Table 9.4 (cf Figure 7.5) shows that VBDU Type 5 (Current state of
knowledge) is preferred in the initial position in article Introductions, while VBDU
Type 4 (Abstract elaborated discussion) is preferred in the final position of Intro-
ductions. This finding agrees well with the results of the move analysis, which
shows how the first two move types in Introductions establish a topic and indi-
cate a gap by surveying previous research (i.e. describing the current state of
knowledge) while the third move type introduces the present study, which appar-
ently requires the use of abstract elaborated discussion.
Table 9.4 Comparing the sequential organization of research articles:

Move types versus VBDU types
Move Types VBDU Types
INTRODUCTION INTRODUCTION
Move 1: Establishing a topic Beginning:
Move 2: Preparing for the present study: VBDU Type 5: Current state of knowledge
Indicating a gap/raising a question (also VBDU Types 3, 4, 6)
Move 3: Introducing the present study

Step 1: Stating purpose(s) End:
Step 2: Describing procedures VBDU Type 4: Abstract elaborated discussion
Step 3: Presenting findings (also VBDU Types 3, 5, 6)
METHODS METHODS
Move 4: Describing materials Beginning:
Step 1: Listing materials VBDU Type 3: Report of past events
Step 2: Detailing the source of the materials (also VBDU Type 2)
Step 3: Providing the background of the materials
Move 5: Describing experimental procedures

Step 1: Documenting established procedures End:
Step 2: Detailing procedures VBDU Type 2: Procedural description of past
actions
Step 3: Providing the background of the (also VBDU Type 3)
procedures
Move 6: Detailing equipment
Move 7: Describing statistical procedures
RESULTS RESULTS
Move 8: Restating methodological issues Most common VBDU Type:
Step 1: Describing aims and purposes VBDU Type 3: Report of past events
Step 2: Stating research questions
Step 3: Making hypotheses
Step 4: Listing procedures or methodological
techniques
Move 9: Justifying methodological issues
Move 10: Announcing results

Step 1: Reporting results
Step 2: Substantiating results
Step 3: Invalidating results
Move 11: Commenting results

Step 5: Summarizing
DISCUSSION DISCUSSION
Move 12: Contextualizing the study Beginning:
Step 1: Describing established knowledge VBDU Type 5: Current state of knowledge
Step 2: Generalizing, claiming, deducing previ- (also VBDU Types 4, 6)
ous knowledge
Move 13: Consolidating results

Step 1: Restating methodology
(purposes, research questions, hypotheses, and
procedures)
Step 2: Stating selected findings
Step 4: Explaining differences in findings End:
Step 5: Making overt claims/generalizations VBDU Type 6: Current abstract discussion
Step 6: Exemplifying (VBDU Type 1: Current evaluation of expla-
nations)
Move 14: Stating limitations of the study (also VBDU Types 4, 5)
Move 15: Suggesting further research
A second example comes from Methods sections, which have a strong preference
for VBDU Type 3 (Report of past events) at the beginning, and VBDU Type 2 (Pro-
cedural description of past actions) at the end (see Figure 7.6). This pattern agrees
with the sequence of move types identified in Methods sections, with procedural
descriptions (experimental and statistical) coming after the description of materi-
als (see Table 9.4). That is, VBDU Type 2 (Procedural description) seems to clearly
correspond functionally to Move Types 5 and 7 (Experimental procedures and Sta-
tistical procedures).
As noted in Section 2.3 above, it is less clear from the functional interpreta-
tions how VBDU Type 3 (Report of past events) corresponds to Move Type 4 (De-
scribing materials). However, the sequential comparison summarized in Table 9.4
indicates that these two do have a strong correspondence the same discourse
units occurring at the beginning of Methods sections describe materials from a
move-analysis perspective, and they report past events from a VBDU perspec-
tive. Findings like this should lead to useful future research, enhancing the de-
scriptions of both the move descriptions and the VBDU descriptions. In the
present case, this correspondence indicates that the description of materials in
Methods sections often utilizes a narrative mode of discourse, reporting the past
actions and events used to obtain experimental materials, rather than a simple
descriptive itemization of materials. The preferred sequence of VBDU Type 3 (Re-
port of past events) before VBDU Type 2 (Procedural description) also raises an
additional possibility when compared to the move analysis: that there might be
two different linguistic styles for Move Type 5 (Describing experimental proce-
dures). That is, it might be the case that experimental procedures are introduced
using simple past tense report (i.e. VBDU Type 3), followed by a more detailed
passive voice description of the actual actions performed to carry out the experi-
ment (VBDU Type 2).
The two analytical perspectives are also in general agreement in their descrip-
tion of Discussion sections. VBDU Type 5 (Current state of knowledge) is preferred
at the beginning of Discussion sections, which corresponds closely to Move Type
12 (Describing established knowledge and Generalizingprevious knowledge).
VBDU Type 6 (Current abstract discussion) is preferred at the end of Discussion
sections, which seems to correspond to Move 13, Steps 45 (Explaining differences
in findings and Making overt claims/generalizations). (VBDU Type 1 Current
evaluation of explanations is also strongly preferred at the end of Discussion sec-
tions when considered in relative terms.)
At the same time, there are several aspects of the comparison between these
two analyses that are problematic, raising useful questions that can help guide fu-
ture research. The main reason for the discrepancies has to do with the different
analytical bases of the two approaches: move types are functional distinctions
while VBDU types are linguistic distinctions. Thus, because the two approaches
have complementary strengths, a more detailed comparison of the resulting analy-
ses should greatly facilitate our understanding of the typical discourse structure of
the target genre.
For example, while Table 9.4 shows that the preferred order of VBDU types in
Introductions agrees well with the preferred sequence of move types, other se-
quences of VBDU types in Introductions are also possible (see Figure 7.5), with
VBDU Types 3, 4, 5, and 6 all being relatively common in both the initial and final
position of Introductions. Because VBDUs have a linguistic basis, this finding

shows that there is considerable variability in the linguistic styles of the initial and
final discourse units in research article introductions. In contrast, the order of
move types seems to be relatively fixed in article introductions: Move 1 (Establish-
ing a territory), followed by Move 2 (Establishing a niche), followed by Move 3
(Occupying a niche). That is, the functional progression of article introductions
seems to be relatively invariant.
These findings might be reconciled in two ways. First, it is possible that Move
Types 1, 2, and 3 do not always occur in this order. This possibility is discussed in
Chapter 4, which notes that the sequence of move types is not fixed, and that
moves are not necessarily continuous stretches of text. However, previous studies
of research article introductions all generally agree that that the move types in RA
introductions usually occur in a relatively fixed order.
This raises the possibility of a second explanation for the two patterns: that a
single move type can be realized by multiple linguistic styles. Future research on
this possibility might eventually lead to the identification of distinct linguistic sub-
types for a given move type. That is, move types are defined on the basis of their
communicative function, but the present comparison indicates that there could be
fairly extensive linguistic variability in the realization of those functions. Future
more detailed research on the linguistic styles used to realize a single move type
should help to clarify the variability in this article section.
In contrast, we find the opposite pattern for Results sections: Table 9.4 shows
that this section contains 13 distinct moves or steps, but these are all usually real-
ized linguistically as the single VBDU Type 3 (Report of past events). In this case,
the move analysis indicates extensive functional variability within this article sec-
tion, but the VBDU analysis indicates only a single linguistic style of expression.
Here again, more detailed comparison of the linguistic characteristics of individu-
al move types should help us to clarify the relationship between function and lin-
guistic expression within this article section.
In sum, the comparison here shows that the overall patterns of discourse or-
ganization documented by move analysis are generally compatible with those
documented by VBDU analysis. At the same time, each approach identifies par-
ticular patterns of variation that help to clarify and extend our understanding of
discourse resulting from the complementary approach. And most interestingly,
there are some apparent discrepancies between the two approaches, which do not
have ready explanations. We hope that future research focusing on these areas of
apparent discrepancy will contribute new knowledge to our understanding of the
discourse of the target genre.
3 Summary and prospects for future research
The present work is one of the first book-length explorations of how corpus-based
methods can be applied to the description of discourse organization. As such, we
have only been able to scratch the surface of this important research area. Numer-
ous avenues for future research are obvious: applying the methods described here
to additional genres; developing these research methods further; and most impor-
tantly, developing additional methodological approaches to these research issues.
In our chapters, we have applied top-down and bottom-up analyses to texts
from two written genres, one genre from academic discourse research articles
and another genre from professional discourse fund raising letters. These same
approaches could be applied to other important written genres, such as newspaper
editorials (Pak & Acevedo, Forthcoming; Wang, Forthcoming), grant proposals
(Connor & Mauranen, 1999; Feng, Forthcoming), book reviews (Suarez & Moreno,
Forthcoming), business letters (Loukianenko, Forthcoming), and websites (Mc-
Bride, Forthcoming). Previous studies of these genres have used top-down analy-
ses to understand discourse structures such as moves and topicsolution top-level
structures. Combining such analyses with bottom-up analyses should result in
more linguistically based descriptions of these genres. Such an approach will pro-
vide useful information that could lead into (semi) automated top-down analyses,
which will enable the use of larger corpora.
The analyses conducted for the chapters in this book have viewed texts as mo-
no-modal paper-based entities. Yet, technology today increasingly develops texts
into multi-modal forms that rely on visuals and digital media. In our study, pic-
tures, headings, white space and such other multimodal textual features were not
coded. Increasingly, however, the multimodality of texts has been found to be im-
portant in the comprehension as well as production of texts (Kress & van Leeu-
wen, 1990, 2001; Ventola, Charles, & Kaltenbacher, 2004). We need to continue
working on ways to incorporate the multimodal properties of texts in our corpus
design and analysis, as well as build ways to analyze digital texts and hypertext,
which often rely on different logics of organization.
Another consideration for corpus-based descriptions of discourse is the
broader contexts around texts. Recent methods of text analysis go beyond texts
and include observations of writers and readers, interviews with writers and focus
groups as well as ethnographic inquiries (Bazerman & Prior, 2004). Some corpus-
based research has also been complemented by more detailed contextual analysis,
such as oral interviews. For example, Hyland (1998) used specialist informants in
the study of hedging devices in a corpus of 80 research articles. Connor (2000)
interviewed five experienced academic grant proposal writers to validate the move
analysis system used to analyze a corpus of grant proposals. Future corpus re-
search of discourse should more systematically incorporate such descriptions of

context obtained through surveys, interviews, observations, and ethnographies.
Thus, we hope that the present work will be the starting point for a new area of
research. For discourse analysts, we hope to have shown that there are systematic,
generalizable patterns of structure and organization across the texts of a genre,
complementing other perspectives such as the detailed description of an individ-
ual text and more abstract socio-cultural descriptions of a genre. And for corpus
linguists, we hope to have shown that discourse can be studied within particular
texts but generalized across a large collection of texts, complementing studies of
language use that use a corpus merely as a large body of linguistic forms in context.
In a sense, we are advocating a return to the research interests of previous decades,
when texts were often analyzed for their internal structure and organization. What
we have added here is the corpus perspective, showing how these research goals
can be investigated across multiple texts, resulting in generalizable descriptions of
discourse organization for a target genre.
Appendix 1
A brief introduction to
multi-dimensional analysis
Sections A.1 A.3 of the following appendix are adapted from Chapters 12 of
Variation in English: Multi-Dimensional Studies, edited by Susan Conrad and
Douglas Biber, published by Longman (2001).
A.1 Conceptual introduction to the multi-dimensional approach to variation
Multi-dimensional (MD) analysis was developed as a methodological approach to:

(1) identify the salient linguistic co-occurrence patterns in a language, in empiri-
cal/quantitative terms; and (2) compare spoken and written genres/registers in the
linguistic space defined by those co-occurrence patterns. The approach was first
used in Biber (1985, 1986) and then developed more fully in Biber (1988).
The salient characteristics of the MD approach are listed below:
The research goal of the approach is the linguistic analysis of texts, genres/
registers, and text types, rather than analysis of individual linguistic construc-
tions.
The importance of variationist and comparative perspectives is assumed by
the approach. That is, the approach is based on the assumption that different
kinds of text differ linguistically and functionally so that analysis of any one or
two text varieties is not adequate for conclusions concerning a discourse do-
main. For example, considering only academic prose and fiction would not
give an accurate representation of writing; rather, many other written varie-
ties, such as newspaper reports, editorials, personal letters, etc., also would
need to be included.
The approach is explicitly multi-dimensional.That is, it is assumed that multi-
ple parameters of variation will operate in any discourse domain.
The approach is empirical and quantitative. Analyses are based on frequency
counts of linguistic features, describing the relative distributions of features
across texts. The linguistic co-occurrence patterns that define each dimension
are identified empirically using multivariate statistical techniques.
The approach synthesizes quantitative and qualitative/functional methodologi-

cal techniques. That is, the statistical analyses are interpreted in functional terms,
to determine the underlying communicative functions associated with each dis-
tributional pattern. The approach is based on the assumption that statistical co-
occurrence patterns reflect underlying shared communicative functions.
The notion of linguistic co-occurrence has been given formal status in the MD ap-
proach, in that different co-occurrence patterns are analyzed as underlying dimen-
sions of variation. The co-occurrence patterns comprising each dimension are
identified quantitatively. That is, based on the actual distributions of linguistic fea-
tures in a large corpus of texts, statistical techniques (specifically factor analysis)
are used to identify the sets of linguistic features that frequently co-occur in texts.
It is not the case, though, that quantitative techniques are sufficient in them-
selves for MD analyses of genre/register variation. Rather, qualitative techniques
are required to interpret the functional bases underlying each set of co-occurring
linguistic features. The dimensions of variation have both linguistic and function-
al content. The linguistic content of a dimension comprises a group of linguistic
features (e.g., nominalizations, prepositional phrases, attributive adjectives) that
cooccur with a high frequency in texts. Based on the assumption that co-occur-
rence reflects shared function, these co-occurrence patterns are interpreted in
terms of the situational, social, and cognitive functions most widely shared by the
linguistic features. That is, linguistic features co-occur in texts because they reflect
shared functions. A simple example is the way in which first and second person
pronouns, direct questions, and imperatives are all related to interactiveness. Con-
tractions, false starts, and generalized content words (e.g., thing) are all related
to the constraints imposed by real-time production. The functional bases of other
co-occurrence patterns are less transparent, so that careful qualitative analyses of
particular texts are required to help interpret the underlying functions.
A.2 Overview of methodology in the multi-dimensional approach
All MD analyses, such as those in Chapters 4, 7, and 8 of the present book follow
the same methodological steps. These steps are summarized in Table A.1, while
the following paragraphs discuss each step in greater detail.
Appendix 1. A brief introduction to multi-dimensional analysis
Table A.1 The eight methodological steps of a complete multi-dimensional analysis
1. An appropriate corpus is designed based on previous research and analysis. Texts are col-
lected, transcribed (in the case of spoken texts), and input into the computer. The situation-
al characteristics of each spoken and written register are noted (e.g., purposes of the register,
production circumstances, and other characteristics discussed in chapter 1).
2. Research is conducted to identify the linguistic features to be included in the analysis, to-
gether with functional associations of the linguistic features.
3. Computer programs are developed for automated grammatical analysis, to identify or tag
all relevant linguistic features in texts.
4. The entire corpus of texts is tagged automatically by computer, and all texts are edited inter-
actively to insure that the linguistic features are accurately identified.
5. Additional computer programs are developed and run to compute frequency counts of each
linguistic feature in each text of the corpus.
6. The co-occurrence patterns among linguistic features are analyzed, using a factor analysis of
the frequency counts.
7. The factors from the factor analysis are interpreted functionally as underlying dimensions
of variation.
8. Dimension scores for each text with respect to each dimension are computed; the mean
dimension scores for each register are then compared to analyze the salient linguistic simi-
larities and differences among the registers being studied.
MD analyses can be conducted to study many different varieties of language

from the full range of spoken/written genres/registers in a language to a specific
subgenre. The first requirement for any MD analysis, therefore, is to compile a text
corpus that represents the variety being studied. Texts must be sampled from all
genres/registers included in the target discourse domain (see Section 3 in Chapter
1). The corpora used in the studies described in Chapters 4, 7, and 8 are all exam-
ples of the kind of corpus needed to conduct an MD analysis.
A second preliminary task in MD analysis is to identify the linguistic features
to be used in the analysis. The goal here is to be as inclusive as possible, identifying
all linguistic features (including lexical classes, grammatical categories, and syn-
tactic constructions) that might have functional associations. Thus, any feature
associated with particular communicative functions, or used to differing extents
in different text varieties, is included. Occurrences of these features are counted in
each text of the corpus, providing the basis for the subsequent statistical analyses.
Computer programs are usually used to tag the words in corpus texts for vari-
ous lexical, grammatical, and syntactic categories, and to compile frequency counts
of linguistic features. The tagger used in previous MD studies (developed by Biber)
marks the word classes and syntactic information required to automatically iden-
tify the linguistic features listed in the last section. Biber (1988, Appendix II; 1993)
provides a description of an early version of the tagging program. Biber, Conrad,
and Reppen (1998, Methodology Boxes 4 and 5) provide a general description of

tagging programs and the process of tagging. In recent years, this tagging program
has been extended as part of the research for the Longman Grammar of Spoken and
Written English (Biber et al, 1999). The full list of linguistic features included in the
MD studies for the present book is given in Appendix Two.
After linguistic features have been tagged, additional computer programs
tally frequency counts of each feature in each text. These counts are normalized to
a common basis, to enable comparison across the texts. Counts are normed (e.g.,
to their rate of occurrence per 1,000 words of text) before conducting statistical
analyses. (The procedure for normalization is further described in Biber, 1988, pp.
7576, and in Biber, Conrad, & Reppen, 1998, Methodology Box 6).
As described in Section A.1 above, co-occurrence patterns are central to MD
analyses because each dimension represents a different set of co-occurring linguis-
tic features. The statistical technique used for identifying these co-occurrence pat-
terns is known as factor analysis, and each set of co-occurring features is referred
to as a factor. In a factor analysis, a large number of original variables (in this case
the linguistic features) are reduced to a small set of derived, underlying variables
the factors.
When considering a set of linguistic features, each having its own variance, it
is possible to analyze the pool of shared variance, that is, the extent to which the
features vary in similar ways. Shared variance is directly related to co-occurrence.
If two features tend to be frequent in some texts and rare in other texts, then they
co-occur and have a high amount of shared variance.
Factor analysis attempts to account for the shared variance among features by
extracting multiple factors, where each factor represents the maximum amount of
shared variance that can be accounted for out of the pool of variance remaining at
that point. Thus, the second factor extracts the maximum amount of shared variance
from the variability left over after the first factor has been extracted, and so on.
Each linguistic feature has some relation to each factor, and the strength of that
relation is represented by factor loadings. (The factor loading represents the amount
of variance that a feature has in common with the total pool of shared variance ac-
counted for by a factor.) Factor loadings can range from 0.0, which shows the ab-
sence of any relationship, to 1.0, which shows a perfect correlation. The factor load-
ing indicates the extent to which one can generalize from a factor to a particular
linguistic feature, or the extent to which a linguistic feature is representative of the
dimension underlying a factor. Put another way, the size of the loading reflects the
strength of the co-occurrence relationship between the feature in question and the
total grouping of co-occurring features represented by the factor.
Each linguistic feature has a loading (or weight) on each factor. However, when
interpreting a factor, only features with salient or important loadings are consid-
Appendix 1. A brief introduction to multi-dimensional analysis
ered. In most MD analyses, features with loadings smaller than.30 are disregarded
as unimportant for the interpretation of a factor. Positive and negative sign are not
related to importance; the sign instead identifies two groupings of features that
occur in a complementary pattern as part of the same factor. That is, when the
features with positive loadings occur together frequently in a text, the features
with negative loadings are markedly less frequent in that text, and vice versa. Table
4.4 in Chapter 4 is an example of a factor analysis with both positive and negative
loadings of multiple linguistic features grouped around seven different factors.
Factor interpretations depend on the assumption that linguistic cooccurrence
patterns reflect underlying communicative functions. That is, particular sets of lin-
guistic features cooccur frequently in texts because they serve related communica-
tive functions. In the interpretation of a factor, it is important to consider the likely
reasons for the complementary distribution between positive and negative feature
sets as well as the reasons for the cooccurrence patterns within those sets.
The interpretation of a factor as a functional dimension is based on (1) analy-
sis of the communicative function(s) most widely shared by the set of co-occurring
features defining a factor, and (2) analysis of the similarities and differences among
registers with respect to the factor. In order to determine the distribution of regis-
ters along a dimension, we compute dimension scores for each text and then com-
pare texts and registers with respect to those scores.
The frequency counts of individual linguistic features might be considered as
scores that can be used to characterize texts (e.g., a noun score, an adjective score,
etc.). In a similar way, dimension scores (or factor scores) can be computed for each
text by summing the frequencies of the features having salient loadings on that
dimension. For example, the Dimension 1 score for each text in the Biber (1988)
MD analysis was computed by adding together the frequencies of private verbs,
that deletions, contractions, present tense verbs, etc. the features with positive
loadings on Factor 1 (from Table 5) and then subtracting the frequencies of
nouns, word length, prepositions, etc. the features with negative loadings.
In MD studies, frequencies are standardized to a mean of 0.0 and a standard
deviation of 1.0 before the dimension scores are computed. This process translates
the scores for all features to scales representing standard deviation units. Thus,
regardless of whether a feature is extremely rare or extremely common in absolute
terms, a standard score of +1 represents one standard deviation unit above the
mean score for the feature in question. That is, standardized scores measure
whether a feature is common or rare in a text relative to the overall average occur-
rence of that feature. The raw frequencies are transformed to standard scores so
that all features on a factor will have equivalent weights in the computation of di-
mension scores. If this process were not followed, extremely common features
would have a much greater influence than rare features on the dimension scores.
The methodological steps followed to standardize frequency counts and compute

dimension scores are described more fully in Biber (1988, pp. 9397).
Once a dimension score is computed for each text, the mean dimension score
for each register can be computed. Plots of these dimension scores then allow lin-
guistic characterization of any given register, comparison of the relations between
any two registers, and a fuller functional interpretation of the underlying dimen-
sion; standard statistical techniques (such as ANOVA and post-hoc tests like Dun-
can or Scheffe) can be used to determine whether the differences among mean
scores are statistically significant. For example, Figure 7.1 in Chapter 7 uses mean
dimension scores to plot the difference in the use of linguistic features between
introduction, methods, results and discussion sections of research articles.
The paragraphs above provide a brief introduction to the analytical techniques
used in MD analysis. However, much more could be said about the technical as-
pects of MD methodology, including such matters as rotation techniques in the
factor analysis; the specific procedures required to compute and interpret factors;
the reliability, validity, and significance of dimensions; and representativeness and
sampling in corpus design. Interested readers are referred to Biber (1990, 1993b,
1993c, 1995), Biber, Conrad, and Reppen (Biber et al., 1998), Biber, Conrad, Rep-
pen, Byrd, and Helt (2003).
Appendix 2
Grammatical and lexico-grammatical features

included in the multi-dimensional analyses
The following list identifies the major grammatical and lexico-grammatical features
identified by the Biber tagger, used for the MD analyses in Chapters 4, 7, and 8.
1. Pronouns and pro-verbs

first person pronouns
second person pronouns
third person pronouns (excluding it)
pronoun it
demonstrative pronouns (this, that, these, those as pronouns)
indefinite pronouns (e.g., anybody, nothing, someone)
pro-verb do
2. Reduced forms and dispreferred structures

contractions
complementizer that deletion (e.g., I think [0] he went)
stranded prepositions (e.g., the candidate that I was thinking of)
split auxiliaries (e.g., they were apparently shown to )
3. Prepositional phrases
4. Coordination
phrasal coordination (NOUN and NOUN; ADJ and ADJ; VERB and VERB;
ADV and ADV)
independent clause coordination (clause initial and)
5. WH-Questions
6. Lexical specificity
type/token ratio
word length
7. Nouns
nominalizations (ending in tion, -ment, -ness, -ity)
nouns
7a. Semantic categories of nouns
animate noun (e.g., teacher, child, person)
cognitive noun (e.g., fact, knowledge, understanding)
concrete noun (e.g., rain, sediment, modem)
technical/concrete noun (e.g., cell, wave, electron)
quantity noun (e.g., date, energy, minute)
place noun (e.g., habitat, room, ocean)
group/institution noun (e.g., committee, bank, congress)
abstract/process nouns (e.g., application, meeting, balance)
8. Verbs
8a. Tense and aspect markers
past tense
perfect aspect verbs
non-past tense
8b. Passives
agentless passives
by passives
8c. Modals
possibility/permission/ability modals (can, may, might, could)
necessity/obligation modals (ought, must, should)
predictive/volition modals (will, would, shall)
8d. Semantic categories of verbs
be as main verb
activity verb (e.g., smile, bring, open)
communication verb (e.g., suggest, declare, tell)
mental verb (e.g., know, think, believe)
causative verb (e.g., let, assist, permit)
occurrence verb (e.g., increase, grow, become)
existence verb (e.g., possess, reveal, include)
aspectual verb (e.g., keep, begin, continue)
8e. Phrasal verbs
intransitive activity phrasal verb (e.g., come on, sit down)
transitive activity phrasal verb (e.g., carry out, set up)
transitive mental phrasal verb (e.g., find out, give up)
Appendix 2. Grammatical and lexico-grammatical features included in the multi-dimensional analyses
transitive communication phrasal verb (e.g., point out)

intransitive occurrence phrasal verb (e.g., come off, run out)
copular phrasal verb (e.g., turn out)
aspectual phrasal verb (e.g., go on)
9. Adjectives
attributive adjectives
predicative adjectives
9a. Semantic categories of adjectives
size attributive adjectives (e.g., big, high, long)
time attributive adjectives (e.g., new, young, old)
color attributive adjectives (e.g., white, red, dark)
evaluative attributive adjectives (e.g., important, best, simple)
relational attributive adjectives (e.g., general, total, various)
topical attributive adjectives (e.g., political, economic, physical)
10. Adverbs and adverbials
place adverbials
time adverbials
10a. Adverb classes
conjuncts (e.g., consequently, furthermore, however)
downtoners (e.g., barely, nearly, slightly)
hedges (e.g., at about, something like, almost)
amplifiers (e.g., absolutely, extremely, perfectly)
emphatics (e.g., a lot, for sure, really)
discourse particles (e.g., sentence initial well, now, anyway)
other adverbs
10b. Semantic categories of stance adverbs
non-factive adverbs (e.g., frankly, mainly, truthfully)
attitudinal adverbs (e.g., surprisingly, hopefully, wisely)
factive adverbs (e.g., undoubtedly, obviously, certainly)
likelihood adverbs (e.g., evidently, predictably, roughly)
11. Adverbial subordination
causative adverbial subordinator (because)
conditional adverbial subordinator (if, unless)
other adverbial subordinator (e.g., since, while, whereas)
12. Nominal post-modifying clauses

that relatives (e.g., the dog that bit me, the dog that I saw)
WH relatives on object position (e.g., the man who Sally likes)
WH relatives on subject position (e.g., the man who likes popcorn)
WH relatives with fronted preposition (e.g., the manner in which he was told)
past participial postnominal (reduced relative) clauses (e.g., the solution pro-
duced by this process)
13. That complement clauses
13a. That clauses controlled by a verb
(e.g., we predict that the water is here)
non-factive verb (e.g., imply, report, suggest)
attitudinal verb (e.g., anticipate, expect, prefer)
factive verb (e.g., demonstrate, realize, show)
likelihood verb (e.g., appear, hypothesize, predict)
13b. That clauses controlled by an adjective
(e.g., it is strange that he went there)
attitudinal adjectives (e.g., good, advisable, paradoxical)
likelihood adjectives (e.g., possible, likely, unlikely)
13c. That clauses controlled by a noun
(e.g., the proposal that he put forward was accepted)
non-factive noun (e.g., comment, proposal, remark)
attitudinal noun (e.g., hope, reason, view)
factive noun (e.g., assertion, observation, statement)
likelihood noun (e.g., assumption, implication, opinion)
14. WH-clauses
15. To-clauses
15a. To-clauses controlled by a verb (e.g., He offered to stay)
speech act verb (e.g., urge, report, convince)
cognition verb (e.g., believe, learn, pretend)
desire/intent/decision verb (e.g., aim, hope, prefer)
modality/cause/effort verb (e.g., allow, leave, order)
probability/simple fact verb (e.g., appear, happen, seem)
Appendix 2. Grammatical and lexico-grammatical features included in the multi-dimensional analyses
15b. To-clauses controlled by an adjective

certainty adjectives (e.g., prone, due, apt)
ability/willingness adjectives (e.g., competent, hesitant)
personal affect adjectives (e.g., annoyed, nervous)
ease/difficulty adjectives (e.g., easy, impossible)
evaluative adjectives (e.g., convenient, smart)
15c. To-clauses controlled by a noun (e.g., agreement, authority, intention)
References
Abelen, E., Redecker, G., & Thompson, S. (1993). The rhetorical structure of US-American and
Dutch fund-raising letters. Text, 13(3), 323350.
Aijmer, K. (2002). English Discourse Particles: Evidence from a Corpus. Amsterdam: John Ben-
jamins.
Anthony, L. (1999). Writing research article introductions in software engineering: How accu-
rate is a standard model? IEEE Transactions on Professional Communication, 42, 3846.
Aristotle. (1932). The Rhetoric of Aristotle (L. D. Cooper, Trans.). New York NY: Appleton and
Company.
Aristotle. (1984). Rhetoric. In J. Barnes (ed.), The Complete Works of Aristotle (rev. Oxford ed.,
Vol. 2, pp. 21522269). Princeton NJ: Princeton University Press.
Arnold, C. (1982). Introduction (W. Kluback, Trans.). In C. Perelman (ed.), The Realm of Rheto-
ric. Notre Dame IN: University of Notre Dame Press.
Baker, P. (2006). Using Corpora in Discourse Analysis. London: Continuum.
Barton, E. (1993). Evidentials, argumentation, and epistemological stance. College English, 55,
745769.
Bateman, J., & Rondhuis, K. J. (1997). Coherence relations: Towards a general specification.
Discourse Processes, 24(1), 350.
Bazerman, C. (1988). Shaping Written Knowledge: The Genre and Activity of the Experimental
Article in Science. Madison WI: University of Wisconsin Press.
Bazerman, C. (1994). Systems of genres and the enactment of social intentions. In A. Freedman
& P. Medway (eds.), Genre and the New Rhetoric (pp. 79104). London: Taylor & Francis.
Bazerman, C. (1997a). Some information comments on texts mediating fund-raising relation-
ships: Cultural sites of affiliation. In Written Discourse in Philanthropic Fund Raising. Issues
of Language and Rhetoric. In Working Papers, 9813 (pp. 1726).
Bazerman, C. (1997b). The life of genre, the life in the classroom. In W. Bishop & H. Ostrum
(Eds.), Genre and Writing (pp. 1926). Portsmouth NH: Boynton/Cook.
Bazerman, C., & Prior, P. (eds.). (2004). What Writing Does and How it Does it: An Introduction
to Analyzing Texts and Textual Practices. Mahwah NJ: Lawrence Erlbaum Associates.
Beach, R., & Anson, C. (1992). Stance and intertextuality in written discourse. Linguistics and
Education, 4, 335357.
Berkenkotter, C., & Huckin, T. (1995). Genre Knowledge in Disciplinary Communication: Cogni-
tion/Culture/Power. Hillsdale NJ: Lawrence Erlbaum.
Bhatia, V. (1993a). Analyzing Genre: Language Use in Professional Settings. London: Longman.
Bhatia, V. (1993b). Simplification vs. easification: The case of legal texts. Applied Linguistics, 4(1),
4254.
Bhatia, V. (1997a). Applied genre analysis and ESP. In T. Miller (ed.), Functional Approaches to
Written Texts: Classroom Applications (pp. 134149). Washington, DC: USIA.
Bhatia, V. (1997b). Discourse of philanthropic fund-raising. Paper presented at the Written dis-
course in philanthropic fund raising. Issues of language and rhetoric, Indianapolis IN.
Bhatia, V. (1998). Generic patterns in fundraising discourse. New Directions for Philanthropic
Fundraising, 22, 95110.
Bhatia, V. (2002). A generic view of academic discourse. In J. Flowerdew (ed.), Academic Dis-
course (pp. 2139). New York NY: Longman.
Bhatia, V. (2004). Worlds of Written Discourse: A Genre-based View. New York NY: Continuum.
Biber, D. (1986). Spoken and written textual dimensions in English: Resolving the contradictory
findings. Language, 62, 384414.
Biber, D. (1988). Variation Across Speech and Writing. Cambridge: CUP.
Biber, D. (1989). A Typology of English texts. Linguistics, 27, 343.
Biber, D. (1990). Methodological issues regarding corpus-based analyses of linguistic variation.
Literary and Linguistic Computing 5, 257269.
Biber, D. (1992). Using computer-based text corpora to analyze the referential strategies of spo-
ken and written texts. In J. Svartvik (ed.), Directions in Corpus Linguistics: Proceedings of
Nobel Symposium 82, Stockholm, 48August 1991 (pp. 213252). Berlin: Mouton.
Biber, D. (1993a). Representativeness in corpus design. Literary and Linguistic Computing, 8,
115.
Biber, D. (1993b). Using register-diversified corpora for general language studies. Computa-
tional Linguistics 19, 219241.
Biber, D. (1993c). The multi-dimensional approach to linguistic analyses of genre variation: An
overview of methodology and findings. Computers and the Humanities 26, 331345.
Biber, D. (1995). Dimensions of Register Variation: A Cross-linguistic Comparison. Cambridge:
CUP.
Biber, D. (2003). Variation among spoken and written registers: A new multi-dimensional anal-
ysis. In P. Leistyna & C. Meyer (eds.), Corpus Analysis: Language Structure and Language
Use. Amsterdam: Rodopi.
Biber, D. (2004). Historical patterns for the grammatical marking of stance: A cross-register
comparison. Journal of Historical Pragmatics, 5, 107135.
Biber, D. (2006a). Stance in spoken and written university registers. Journal of English for Aca-
demic Purposes, 5, 97116.
Biber, D. (2006b). University Language: A Corpus-based Study of Spoken and Written Registers.
Amsterdam: John Benjamins.
Biber, D., Conrad, S., & Reppen, R. (1998). Corpus Linguistics: Investigating Language Structure
and Use. Cambridge: CUP.
Biber, D., Conrad, S., Reppen, R., Byrd, P., & Helt, M. (2002). Speaking and writing in the uni-
versity: A multidimensional analysis. TESOL Quarterly, 36, 948.
Biber, D., Conrad, S., Reppen, R., Byrd, P., & Helt, M. (2003). Strengths and goals of multidimen-
sional analysis: A response to Ghadessy. TESOL Quarterly, 37, 151155.
Biber, D., Csomay, E., Jones, J., & Keck, C. (2004). A corpus linguistic investigation of vocabu-
lary-based discourse units in university registers. In U. Connor & T. Upton (eds.), Applied
Corpus Linguistics: A Multi-dimensional Perspective (pp. 5372). Amsterdam: Rodopi.
Biber, D., & Finegan, E. (1988). Adverbial stance types in English. Discourse Processes, 11,
134.
Biber, D., & Finegan, E. (1989). Styles of stance in English: lexical and grammatical marking of
evidentiality and affect. The Politics of Language Purism, 250.
References
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman Grammar of Spo-
ken and Written English. London: Pearson Education.
Brett, P. (1994). A genre analysis of the results section of sociology articles. English for Specific
Purposes, 13(1), 4759.
Brown, G., & Yule, G. (1983). Discourse Analysis. Cambridge: CUP.
Bruthiaux, P. (1994). Me Tarzan, you Jane: Linguistic simplification in personal ads register. In
D. Biber & E. Finegan (eds.), Sociolinguistic Perspectives on Register (pp. 136154). New
York NY: OUP.
Bruthiaux, P. (1996). The Discourse of Classified Advertising: Exploring the Nature of Linguistic
Simplicity. New York NY: OUP.
Bunton, D. (2002). Generic moves in Ph.D. thesis introductions. In J. Flowerdew (ed.), Aca-
demic Discourse (pp. 5775). London: Pearson Education.
Callow, K., & Callow, J. (1992). Text as purposive communication: A meaning-based analysis. In
W. Mann & S. Thompson (eds.), Discourse Description: Diverse Linguistic Analyses of a
Fund-raising Text (pp. 537). Amsterdam: John Benjamins.
Capozzoli, M., McSweeney, L., & Sinha, D. (1999). Beyond kappa: A review of interrater agree-
ment measures. The Canadian Journal of Statistics, 27(1), 323.
Cazden, C. (1986). Language in the classroom. In R. Kaplan (ed.), Annual Review of Applied
Linguistics (Vol. 7). Rowley MA: Newbury House.
Chafe, W. (1986). Evidentiality in English conversation and academic writing. In W. Chafe & J.
Nichols (eds.), Evidentiality: The Linguistic Coding of Epistemology (pp. 261272). Norwood
NJ: Ablex.
Chafe, W. (1994). Discourse, Consciousness, and Time. Chicago IL: University of Chicago Press.
Chafe, W. (1997). Polyphonic topic development. In T. Givn (ed.), Conversation: Cognitive,
Communicative and Social Perspectives. Amsterdam: John Benjamins.
Chafe, W., & Nichols, J. (eds.). (1986). Evidentiality: The Linguistic Coding of Epistemology. Nor-
wood NJ: Ablex.
Chaudron, C. (1988). Second Language Classrooms: Research on Teaching and Learning. Cam-
bridge: CUP.
Chu, B. (1996). Introductions in state-of-the-art, argumentative, and teaching tips TESL journal
articles: Three possible sub-genres of introductions? Unpublished Research Monograph
No. 12. City University of Hong Kong.
Collins, P. (1991). Cleft and the Pseudo-cleft Constructions in English. London: Routledge.
Collins, P. (1995). The indirect object construction in English: An information approach. Lin-
guistics, 33, 3549.
Cone, A. L. (1987). How to Create and Use Solid Gold Fund-raising Letters. Ambler PA: Fund-
Raising Institute.
Connor, U. (1987). Argumentative patterns in student essays: Cross-cultural differences. In U.
Connor & R. Kaplan (eds.), Writing Across Languages: Analysis of L2 Text (pp. 7387).
Reading MA: Addison-Wesley.
Connor, U. (1996). Contrastive Rhetoric. Cross-cultural Aspects of Second-language Writing.
Cambridge: CUP.
Connor, U. (1997). Comparing research and not-for-profit grant proposals. In Written Discourse
in Philanthropic Fund Raising: Issues of Language and Rhetoric (Vol. Working Papers, 9813,
pp. 4564). Indianapolis IN: Indiana University Center on Philanthropy.
Connor, U. (2000). Variation in rhetorical moves in grant proposals of US humanists and scien-
tists. Text, 20(1), 128.
Connor, U., Davis, K., & De Rycker, T. (1995). Correctness and clarity in applying for overseas
jobs: A cross-cultural analysis of US and Flemish applications. Text, 15(4), 457475.
Connor, U., & Gladkov, K. (2004). Rhetorical appeals in fundraising direct mail letters. In U.
Connor & T. Upton (eds.), Discourse in the Professions: Perspectives from Corpus Linguistics.
(pp. 257286). Amsterdam: John Benjamins.
Connor, U., & Lauer, J. (1985). Understanding persuasive essay writing: Linguistic/rhetorical
approach. Text, 5(4), 309326.
Connor, U., & Mauranen, A. (1999). Linguistic analysis of grant proposals: European Union
research grants. English for Specific Purposes, 18(1), 4762.
Connor, U., Precht, K., & Upton, T. (2002). Business English: Learner data from Belgium, Fin-
land, and the U.S. In S. Granger, J. Hung & S. Petch-Tyson (eds.), Computer Learner Cor-
pora, Second Language Acquisition, and Foreign Language Teaching (pp. 175194). Amster-
dam: John Benjamins.
Connor, U., & Upton, T. (2003). Linguistic dimensions of direct mail letters. In C. Meyer & P.
Leistyna (eds.), Corpus Analysis: Language Structure and Language Use (pp. 7186). Am-
sterdam: Rodopi.
Connor, U., & Upton, T. (2004a). The genre of grant proposals: A corpus linguistic analysis. In
U. Connor & T. Upton (eds.), Discourse in the Professions: Perspectives from Corpus Linguis-
tics (pp. 235256). Amsterdam: John Benjamins.
Connor, U., & Upton, T. (eds.). (2004b). Discourse in the Professions: Perspectives from Corpus
Linguistics. Amsterdam: John Benjamins.
Connor, U., & Wagner, L. (1998). Language use in grant proposals by nonprofits: Spanish and
English. New Directions for Philanthropic Fundraising, 22, 5973.
Conrad, S. (2001). Variation among disciplinary texts: A comparison of textbooks and journal
articles in biology and history. In S. Conrad & D. Biber (eds.), Variation in English: Multi-
Dimensional Studies (pp. 94107). London: Longman.
Conrad, S., & Biber, D. (2000). Adverbial marking of stance in speech and writing. In S. Hunston
& G. Thompson (eds.), Evaluation in Text (pp. 5673). Oxford: OUP.
Conrad, S., & Biber, D. (eds.). (2001). Variation in English: Multi-dimensional studies. London:
Longman.
Cooper, A. (1988). Given-new: Enhancing coherence through cohesiveness. Written Communi-
cation, 5, 352367.
Corbett, E. (1965). Classical Rhetoric for the Modern Student. New York NY: OUP.
Coulthard, M. (ed.). (1994). Advances in Written Text Analysis. London: Routledge.
Couture, B. (1986). Effective ideation in written text: A functional approach to clarity and exi-
gence. In B. Couture (ed.), Functional Approaches to Writing: Research Perspectives (pp.
6991). Norwood NJ: Ablex.
Crismore, A. (1997). Visual rhetoric in an Indiana University Foundation Annual Report. In
Written Discourse in Philanthropic Fund Raising. Issues of Language and Rhetoric. In Work-
ing Papers, 9813 (pp. 64100). Indianapolis, IN.
Crookes, G. (1986). Towards a validated analysis of scientific text structure. Applied Linguistics,
7(1), 5770.
Crossley, S. (2007). A chronotopic approach to genre analysis: An exploratory study. English for
Specific Purposes, 26(1), 424.
References
Csomay, E. (2002). Episodes in University Classrooms: A Corpus Linguistic Investigation. Unpub-

lished Ph.D. Dissertation. Flagstaff, AZ: Northern Arizona University.
Csomay, E. (2005a). Linguistic variation in the lexical episodes of university classroom talk. In
A. Tyler, M. Takada, Y. Kim & D. Marinova (eds.), Language in Use: Cognitive and Discourse
Perspectives on Language and Language Learning. Georgetown University Round Table on
Languages and Linguistics (pp. 150162). Washington DC: Georgetown University Press.
Csomay, E. (2005b). Linguistic variation within university classroom talk: A corpus-based per-
spective. Linguistics and Education, 15, 243274.
Csomay, E. (2006). Academic talk in American classrooms: Crossing the boundaries of oral-
literate discourse? Journal of English for Academic Purposes, 5, 117135.
Dahlgren, K. (1996). Discourse coherence and segmentation. In E. Hovy & D. Scott (eds.), Com-
putational Discourse: Burning Issues An Interdisciplinary Account (NATO ASI Series, Se-
ries F: Computer and Systems Sciences, Vol. 151). Heidelberg: Springler-Verlag.
de Haan, P. (1989). Postmodifying Clauses in the English Noun Phrase: A Corpus-based Study.
Amsterdam: Rodopi.
Dubois, B. (1997). The Biomedical Discussion Section in Context. Greenwich CT: Ablex.
Dudley-Evans, T. (1994a). Genre analysis: An approach to text analysis for ESP. In M. Coulthard
(ed.), Advances in Written Text Analysis (pp. 219228). London: Routledge.
Dudley-Evans, T. (1994b). Variation in the discourse patterns favoured by different disciplines
and their pedagogical implications. In J. Flowerdew (ed.), Academic Listening: Research Per-
spectives (pp. 146157). New York NY: CUP.
Dudley-Evans, T. (1995). Genre models for the teaching of academic writing to second language
speakers: Advantages and disadvantages. The Journal of TESOL France, 2(2), 181193.
Everitt, B. (1974). Cluster Analysis. New York NY: Wiley.
Feng, H. (Forthcoming). A genre-based study of research grant proposals in China. In U. Con-
nor, E. Nagelhout & W. Rozycki (eds.), Contrastive Rhetoric: Reaching to Intercultural Rhet-
oric. Amsterdam: John Benjamins.
Ferguson, C. A. (1983). Sports announcer talk: Syntactic aspects of register variation. Language
in Society, 12, 153172.
Ferguson, C. A. (1994). Dialect, register, and genre: Working assumptions about conventionali-
zation. In D. Biber & E. Finegan (eds.), Sociolinguistic Perspectives on Register (pp. 1530).
New York NY: OUP.
Finegan, E., & Biber, D. (2001). Register variation and social dialect variation: The register axi-
om. In P. Eckert & J. Rickford (eds.), Style and Sociolinguistic Variation (pp. 235267). Cam-
bridge: CUP.
Flowerdew, J. (1993). An educational, or process, approach to the teaching of professional gen-
res. ELT Journal, 47(4), 305316.
Fortanet, I. (2004). The use of we in university lectures: Reference and function. English for
Specific Purposes, 23, 4566.
Fox, B. A. (1987). Discourse Structure and Anaphora. Written and Conversational English. Cam-
bridge: CUP.
Fox, B. A., & Thompson, S. A. (1990). A discourse explanation of the grammar of relative claus-
es in English conversation. Language, 66, 297316.
Gee, J. P. (1986). Units in the production of narrative discourse. Discourse Processes, 9(4), 391422.
Geisler, C. (1995). Relative Infinitives in English. Uppsala: Uppsala University.
Givn, T. (Ed.). (1983). Topic Continuity in Discourse. Amsterdam: John Benjamins.
Grabe, W., & Kaplan, R. (1996). Theory and Practice of Writing. New York NY: Longman.
Granger, S. (1983). The be + past participle Construction in Spoken English, with Special Empha-
sis on the Passive. Amsterdam: North Holland.
Graves, R. (1997). Dear friend (?): Culture and genre in American and Canadian direct market-
ing letters. Journal of Business Communication, 34(3), 235252.
Grimes, J. (1975). The Thread of Discourse. The Hague: Mouton.
Grosz, B. J., & Sidner, C. L. (1986). Attention, intentions and the structure of discourse. Compu-
tational Linguistics, 12, 175204.
Halliday, M. (1989). Spoken and Written Language. Oxford: OUP.
Halliday, M., & Hasan, R. (1976). Cohesion in English. London: Longman.
Hansen, C. (1994). Topic identification in lecture discourse. In J. Flowerdew (ed.), Academic
Listening: Research Perspectives (pp. 131145). New York: CUP.
Hearst, M. (1994). Multi-paragraph segmentation of expository texts (Technical Report 94/790,
Computer Science Division (EECS)). Berkeley CA: University of California.
Hearst, M. (1997). TextTiling: Segmenting text into multi-paragraph subtopic passages. Compu-
tational Linguistics, 23(1), 3364.
Heath, S. B., & Langman, J. (1994). Shared thinking and the register of coaching. In D. Biber &
E. Finegan (eds.), Sociolinguistic Perspectives on Register (pp. 82105). New York: OUP.
Henry, A., & Roseberry, R. (1996). Using a small corpus to obtain data for teaching a genre. In
M. Ghadessy, A. Henry & R. Roseberry (eds.), Small corpus studies and ELT: Theory and
practice (pp. 93133). Amsterdam: John Benjamins.
Hewings, M., & Hewings, A. (2002). It is interesting to note that...: A comparative study of an-
ticipatory it in student and published writing. English for Specific Purposes, 21, 367383.
Hobbs, J. (1979). Coherence and coreference. Cognitive Science, 10(3), 6790.
Hoey, M. (1983). On the Surface of Discourse. London: Allen and Unwin.
Hoey, M. (1986). Overlapping patterns of discourse organization and their implications for
clause relational analysis in problem-solution text. In C. R. Cooper & S. Greenbaum (eds.),
Studying Writing: Linguistic Approaches (pp. 187214). Newbury Park, CA: Sage.
Hoey, M. (1991). Patterns of Lexis in Text. Oxford: OUP.
Holmes, J. (1988). Doubt and certainty in ESL textbooks. Applied Linguistics, 9, 2044.
Hopkins, A., & Dudley-Evans, T. (1988). A genre-based investigation of the discussion sections
in articles and dissertations. English for Specific Purposes, 7(2), 113122.
Horn, B. (2005). Quantitative and qualitative approaches to text structure analysis: A compari-
son of two methods. Ph.D. seminar research paper, Northern Arizona University.
Hunston, S. (1994). Evaluation and organization in a sample of written academic discourse. In
M. Coulthard (ed.), Advances in Written Text Analysis (pp. 191218). London: Routledge.
Hunston, S., & Thompson, G. (eds.). (2000). Evaluation in Text: Authorial Stance and the Con-
struction of Discourse. New York: OUP.
Hunston, S. (2002). Corpora in Applied Linguistics. Cambridge: CUP.
Hyland, K. (1996a). Talking to the academy: Forms of hedging in science research articles. Writ-
ten Communication, 13, 251281.
Hyland, K. (1996b). Writing without conviction? Hedging in science research articles. Applied
Linguistics, 17, 433454.
Hyland, K. (1998). Hedging in Scientific Research Articles. Amsterdam: John Benjamins.
Hyland, K. (1999a). Academic attribution: Citation and the construction of disciplinary knowl-
edge. Applied Linguistics, 20(3), 341367.
References
Hyland, K. (1999b). Disciplinary discourses: Writer stance in research articles. In C. Candlin &
K. Hyland (eds.), Writing: Texts, Processes and Practices (pp. 122142). London: Longman.
Hyland, K. (2000). Disciplinary Discourses: Social Interaction in Academic Genres. London:
Longman.
Hyland, K. (2001). Bringing in the reader: Addressee features in academic articles. Written Com-
munication, 18(4), 549574.
Hyland, K. (2002a). Authority and invisibility: Authorial identity in academic writing. Journal of
Pragmatics, 34(8), 10911112.
Hyland, K. (2002b). Directives: Argument and engagement in academic writing. Applied Lin-
guistics, 23(2), 215239.
Hyland, K. (2004a). A convincing argument: Corpus analysis and academic persuasion. In U.
Connor & T. Upton (Eds.), Discourse in the Professions: Perspectives from Corpus Linguistics
(pp. 87112). Amsterdam: John Benjamins.
Hyland, K. (2004b). Disciplinary interactions: Metadiscourse in L2 postgraduate writing. Jour-
nal of Second Language Writing, 13(2), 133151.
Hymes, D. (1984). Sociolinguistics: Stability and consolidation. International Journal of the Soci-
ology of Language, 45, 3945.
Jaworski, A., & Coupland, N. (1999). Introduction: Perspectives on discourse analysis. In A. Ja-
worski & N. Coupland (eds.), The Discourse Reader (pp. 144). London: Routledge.
Johansson, C. (1995). The Relativizers Whose and Of Which in Present-Day English: Description
and Theory. Uppsala: Uppsala University.
Journal Citation Reports. (2004). Philadelphia PA: Thomson.
Kanoksilapatham, B. (2003). A corpus-based investigation of biochemistry research articles:
Linking move analysis with multidimensional analysis. Unpublished Ph.D. Dissertation.
Georgetown University.
Kanoksilapatham, B. (2005). Rhetorical structure of biochemistry research articles. English for
Kinneavy, J. (1971). Theory of Discourse. Englewood Cliffs NJ: Prentice-Hall.
Korolija, N., & Linell, P. (1996). Episodes: Coding and analyzing coherence in multiparty con-
versation. Linguistics, 34, 799831.
Kress, G., & van Leeuwen, T. (1990). Reading Images. The Grammar of Visual Design. London:
Routledge.
Kress, G., & van Leeuwen, T. (2001). Multimodal Discourse. The Modes and Media of Contempo-
rary Communication. London: Arnold.
Kwan, B. (2006). The schematic structure of literature reviews in doctoral theses of applied lin-
guistics. English for Specific Purposes, 25(1), 3055.
Labov, W. (1984). Intensity. In D. Schiffrin (ed.), Meaning, Form, and Use in Context: Linguistic
Applications (pp. 4370). Washington DC: Georgetown University Press.
Labov, W., & Waletsky, J. (1967). Narrative analysis: Oral versions of personal experience. In J.
Helm (Ed.), Essays on the Verbal and Visual Arts: Proceedings of the 1966 Annual Spring
Meeting of the American Ethnological Society (pp. 1214). Seattle WA: University of Wash-
ington Press.
Lauer, J. (1997). Fundraising letters. In Written Discourse in Philanthropic Fund Raising. Issues of
Language and Rhetoric. In Working Papers, 9813 (pp. 101108). Indianapolis IN.
Lee, D. (2001). Genres, registers, text types, domains, and styles: Clarifying the concepts and
navigating a path through the BNC jungle. Language Learning and Technology, 5, 3772.
Lewin, B., Fine, J., & Young, L. (2001). Expository Discourse: A Genre-based Approach to Social
Science Research Texts. New York NY: Continuum.
Lewis, H. (1997). Direct mail fund raising tactics. Direct Marketing, 59, 2830.
Lindquist, H., & Mair, C. (eds.). (2004). Corpus Approaches to Grammaticalization in English.
Amsterdam: John Benjamins.
Long, M., & Sato, C. (1983). Classroom foreigner talk discourse: Forms and functions of teach-
ers questions. In H. Seliger & M. Long (eds.), Classroom Oriented Research in Second Lan-
guage Acquisition. Rowley MA: Newbury House.
Loukianenko, M. (Forthcoming). Different cultures different discourses? Rhetorical patterns
of business letters by English and Russian speakers. In U. Connor, E. Nagelhout & W. Rozy-
cki (rds.), Contrastive Rhetoric: Reaching to Intercultural Rhetoric. Amsterdam: John Ben-
jamins.
Love, A. (2002). Introductory concepts and cutting edge theories: Can the genre of the textbook
accommodate both? In J. Flowerdew (ed.), Academic Discourse (pp. 7692). New York NY:
Longman.
Mair, C. (1990). Infinitival Complement Clauses in English. New York: CUP.
Mandler, J. M., & Johnson, N. S. (1977). Remembrance of things parsed: Story structure and
recall. Cognitive Psychology 9, 111151.
Mann, W., Matthiessen, C., & Thompson, S. (1992). Rhetorical structure theory and text analy-
sis. In W. Mann & S. Thompson (eds.), Discourse Description: Diverse Linguistic Analyses of
a Fund-raising Text. Amsterdam: John Benjamins.
Mann, W., & Thompson, S. (1988). Rhetorical structure theory: Towards a functional theory of
text organization. Text, 8(3), 243282.
Mann, W., & Thompson, S. (eds.). (1992). Discourse Description: Diverse Linguistic Analyses of a
Fund-raising Text. Amsterdam: John Benjamins.
Marcu, D. (2000). The Theory and Practice of Discourse Parsing and Summarization. Cambridge
MA: The MIT Press.
Martin, J. R. (1985). Process and text: Two aspects of human semiosis. In J. D. Benson & W. S.
Greaves (eds.), Systemic Perspectives on Discourse (Vol. Vol. 1, pp. 248274). Norwood NJ:
Ablex.
Martin, J. R., & Rothery, J. (1986). What a functional approach can show teachers. In B. Couture
(ed.), Functional Approaches to Writing: Research Perspectives (pp. 241265). Norwood NJ:
Ablex.
Marton, F., & Tsui, A. (2004). Classroom Discourse and the Space of Learning. Mahwah NJ: Law-
rence Erlbaum.
Mauranen, A. (2001). Reflexive academic talk. In R. Simpson & J. Swales (Eds.), Corpus linguis-
tics in North America: Selections for the 1999 Symposium. Ann Arbor MI: The University of
Michigan Press.
McBride, K. (Forthcoming). English web page use in an EFL setting: A contrastive rhetoric view
of the development of information literacy. In U. Connor, E. Nagelhout & W. Rozycki (eds.),
Contrastive Rhetoric: Reaching to Intercultural Rhetoric. Amsterdam: John Benjamins.
McCagg, P. (1997). Metaphorical morality and the discourse of philanthropy. In Written Dis-
course in Philanthropic Fund Raising. Issues of Language and Rhetoric. In Working Papers,
9813 (pp. 109120). Indianapolis IN.
McEnery, T., Xiao, R., & Tono, Y. (2006). Corpus-based Language Studies. London: Routledge.
McNamara, D. S., & Kintsch, W. (1996). Learning from texts: Effects of prior knowledge and text
coherence. Discourse Processes, 22, 247288.
References
Mehan, H. (1979). Learning Lessons: Social Organization in the Classroom. Cambridge MA: Har-
vard University Press.
Meyer, C. (1985). Prose analysis: Purposes, procedures, and problems. In B. K. Britton & J. B.
Black (eds.), Understanding Expository Text: A Theoretical and Practical Handbook for Ana-
lyzing Explanatory Text. Hillsdale NJ: Lawrence Erlbaum Associates.
Meyer, C. (1992). Apposition in Contemporary English. Cambridge: CUP.
Meyer, C., & Leistyna, P. (eds.). (2003). Corpus Analysis: Language Structure and Language Use.
Amsterdam: Rodopi.
Miller, C. (1984). Genre as a social action. Quarterly Journal of Speech, 70, 157178.
Morris, J., & Hirst, G. (1991). Lexical cohesion computed by thesaural relations as an indicator
of the structure of text. Computational Linguistics, 18, 537544.
Myers, G. (1997). Wednesday morning and the millennium: Notes on time in fund-raising texts.
In Written Discourse in Philanthropic Fund Raising. Issues of Language and Rhetoric. In
Working Papers, 9813 (pp. 121134). Indianapolis IN.
Myhill, J. (1995). Change and continuity in the functions of the American English modals. Lin-
guistics, 33, 157211.
Myhill, J. (1997). Should and ought: The rise of individually oriented modality in American
English. English Language and Linguistics, 1, 323.
Naczi, R., Reznicek, A., & Ford, B. (1998). Morphological, geographical, and ecological differentia-
tion in the Carex willdenowii coplex (Cyberaceae). American Journal of Botany, 85, 434447.
Nattinger, J., & DeCarrico, J. (1992). Lexical Phrases. New York: CUP.
Nesselhauf, N. (2005). Collocations in a Learner Corpus. Amsterdam: John Benjamins.
Nwogu, K. (1991). Structure of science popularizations: A genre-analysis approach to the sche-
ma of popularized medical texts. English for Specific Purposes, 10(2), 111123.
Nwogu, K. (1997). The medical research paper: Structure and functions. English for Specific Pur-
poses, 16(2), 119138.
Ochs, E. (Ed.). (1989). The Pragmatics of Affect. Special Edition of Text, 9(3).
Orwin, R. G. (1994). Evaluating coding decisions. In H. Cooper & L. Hedges (Eds.), The Hand-
book of Research Synthesis (pp. 139162). New York NY: Russell Sage Foundation.
Pak, C., & Acevedo, R. (Forthcoming). Spanish language newspaper editorials from Mexico,
Spain, and the U.S. In U. Connor, E. Nagelhout & W. Rozycki (eds.), Contrastive Rhetoric:
Reaching to Intercultural Rhetoric. Amsterdam: John Benjamins.
Paltridge, B. (1994). Genre analysis and the identification of textual boundaries. Applied Linguis-
tics, 15(3), 288299.
Pang, T. (2002). Textual analysis and contextual awareness building: A comparison of two ap-
proaches to teaching genre. In A. Johns (ed.), Genre in the Classroom: Multiple Perspectives
(pp. 145161). Mahwah NJ: Lawrence Erlbaum.
Partington, A. (2003). The Linguistics of Political Argument. The Spin-doctor and the Wolf-pack at
the White House. London: Routledge.
Passonneau, R., & Litman, D. J. (1996). Empirical analysis of three dimensions of spoken dis-
course: segmentation, coherence, and linguistic devices. In E. H. Hovy & D. R. Scott (eds.),
Computational and Conversational Discourse (NATO ASI Series, Series F Computer and
Systems Series, Vol. 151). New York NY: Springer Verlag.
Passonneau, R., & Litman, D. J. (1997). Discourse segmentation by human and automated
means. Computational Linguistics, 23(1), 103140.
Peng, J. (1987). Organisational features in chemical engineering research articles. ELR Journal,
1, 79116.
Perelman, C. (1982). The Realm of Rhetoric (W. Kluback, Trans.). Notre Dame IN: University of
Notre Dame Press.
Phillips, M. K. (1985). Aspects of Text Structure: An Investigation of the Lexical Organization of
Text. Amsterdam: North-Holland.
Polanyi, L. (1985). Telling the American Story: A Structural and Cultural Analysis of Storytelling.
Norwood NJ: Ablex.
Polanyi, L. (1988). A formal model of the structure of discourse. Journal of Pragmatics, 12, 601638.
Poole, D. (2005). Cross-cultural variation in classroom turn-taking practices. In P. Bruthiaux, D.
Atkinson, W. Grabe & V. Ramanathan (eds.), Directions in Applied Linguistics. Buffalo: Mul-
tilingual Matters.
Posteguillo, S. (1999). The schematic structure of computer science research articles. English for
Specific Purposes, 18(2), 139158.
Precht, K. (2000). Patterns of stance in English. Ph.D. dissertation, Northern Arizona University.
Prince, E. F. (1978). A comparison of wh-clefts and it-clefts in discourse. Language, 54, 883906.
Prince, E. F. (1981). Toward a taxonomy of given-new information. In P. Cole (ed.), Radical
Pragmatics. New York NY: Academic Press.
Quirk, R., Greenbaum, S., Leech, G. & Svartvik, J. (1985). A Comprehensive Grammar of the
English Language. London: Longman.
Raymond, J. (1982). What we dont know about the evaluation of writing. College Composition
and Communication, 33(4), 399403.
Rmer, U. (2005). Progressives, Patterns, Pedagogy: A Corpus-driven Approach to English Progres-
sive Forms, Functions, Contexts and Didactics. Amsterdam: John Benjamins.
Salager-Meyer, F. (1997). I think that perhaps you should: A study of hedges in written discourse
analysis. In T. Miller (ed.), Functional Approaches to Written Text: Classroom Applications
(pp. 105117). Washington DC: USIA.
Sampson, G., & McCarthy, D. (eds.). (2004). Corpus Linguistics: Readings in a Widening Disci-
pline. London: Continuum.
Samraj, B. (2002). Introductions in research articles: Variation across disciplines. English for
Sanders, T. (1997). Semantic and pragmatic sources of coherence: On the categorization of co-
herence relations in context. Discourse Processes, 24(1), 119148.
Sanders, T., & Noordman, L. G. (2000). The role of coherence relations and their linguistic
markers. Discourse Processes, 29(1), 3760.
Schiffrin, D. (1981). Tense variation in narrative. Language, 57, 4562.
Schiffrin, D. (1985a). Conversational coherence: The role of well. Language, 61(640667).
Schiffrin, D. (1985b). Multiple constraints on discourse options: A quantitative analysis of caus-
al sequences. Discourse Processes, 8, 281303.
Schiffrin, D. (1987). Discourse Markers. Cambridge: CUP.
Schiffrin, D. (1994). Approaches to Discourse. Oxford: Blackwell.
Schiffrin, D., Tannen, D., & Hamilton, H. (eds.). (2001). The Handbook of Discourse Analysis.
Oxford: Blackwell Publishers.
Scollon, R., & Scollon, S. W. (2001). Discourse and intercultural communication. In D. Schiffrin,
D. Tannen & H. Hamilton (eds.), The Handbook of Discourse Analysis (pp. 538547). Ox-
ford: Blackwell.
References
Scott, M. (2004a). Definition of key-ness [Electronic Version]. Wordsmith Tools online manu-
al.Retrieved November 28, 2005 from http://www.lexically.net/downloads/version4/html/
index.html.
Scott, M. (2004b). Wordsmith Tools (Version 4.0) [Computer software]. Oxford: OUP.
Scott, M., & Tribble, C. (2006). Textual Patterns: Key Words and Corpus Analysis in Language
Education. Amsterdam: John Benjamins.
Simpson, R., & Mendis, D. (2003). A corpus-based study of idioms in academic speech. TESOL
Quarterly, 37, 419441.
Sinclair, J., & Coulthard, M. (1975). Towards an Analysis of Discourse. Oxford: OUP.
Stubbs, M. (1983). Discourse Analysis: The Sociolinguistic Analysis of Natural Language. Oxford:
Blackwell.
Suarez, L., & Moreno, A. (Forthcoming). The rhetorical structure of academic book reviews of
literature: An English-Spanish cross-linguistic approach. In U. Connor, E. Nagelhout & W.
Rozycki (eds.), Contrastive Rhetoric: Reaching to Intercultural Rhetoric. Amsterdam: John
Benjamins.
Swales, J. (1981). Aspects of Article Introductions. Birmingham AL: University of Aston.
Swales, J. (1984). Research into the structure of introductions to journal articles and its applica-
tion to the teaching of academic writing. In R. Williams & J. Kirkman (eds.), Common
Grounds: Shared Interests in ESP and Communication Studies (pp. 7786). New York NY:
Pergamon Press.
Swales, J. (1990). Genre Analysis: English for Academic and Research Settings. Cambridge: CUP.
Swales, J. (2004). Research Genres: Explorations and Applications. Cambridge: CUP.
Swales, J., & Burke, A. (2003). Its really fascinating work: Differences in evaluative adjectives
across academic registers. In P. Leistyna & C. Meyer (eds.), Corpus Analysis: Language
Structure and Language Use. Amsterdam: Rodopi.
Swales, J., & Luebs, M. (2002). Genre analysis and the advanced second language writer. In E.
Barton & G. Stygall (eds.), Genre in the Classroom: Multiple Perspectives (pp. 105119).
Mahwah NJ: Lawrence Erlbaum.
Swales, J., & Najjar, H. (1987). The writing of research article introductions. Written Communi-
cation, 4, 175192.
Tannen, D. (1984). Conversational Style: Analyzing Talk among Friends. Norwood NJ: Ablex.
Tannen, D. (1987). Repetition in conversation: Toward a poetic of talk. Language, 63, 574605.
Tannen, D. (1989). Talking Voices: Repetition, Dialogue, and Imagery in Conversational Dis-
course. Cambridge: CUP.
Thompson, D. (1993). Arguing for experimental facts in science. Written Communication, 10(1),
106128.
Thompson, D., & Ye, Y. (1991). Evaluation in the reporting verbs used in academic papers. Ap-
plied Linguistics, 12(4), 365382.
Thompson, S. (1983). Grammar and discourse: The English detached participial clause. In F.
Klein-Andreu (ed.), Discourse Perspectives on Syntax. New York NY: Academic Press.
Thompson, S. (1985). Grammar and written discourse: Initial vs final purpose clauses in text.
Text, 5(12), 5584.
Thompson, S. (1994). Frameworks and context: A genre based approach to analyzing lecture
introductions. English for Specific Purposes, 13(2), 171186.
Thompson, S., & Mulac, A. (1991a). A quantitative perspective on the grammaticization of epis-
temic parentheticals in English. In E. Traugott & B. Heine (eds.), Approaches to Grammati-
calization (Vol. 2). Amsterdam: John Benjamins.
Thompson, S., & Mulac, A. (1991b). The discourse conditions for the use of the complementizer
that in conversational English. Journal of Pragmatics, 15, 237251.
Tirkkonen-Condit, S. (1985). Argumentative Text Structure and Translation (Vol. 18). Jyvskyl,
Finland: Kirjapaino Oy, Sissuomi.
Tomlin, R., Forrest, L., Ming Pu, M., & Hee Kim, M. (1997). Discourse semantics. In T. Van Dijk
(ed.), Discourse as Structure and Process. Discourse Studies: A Multidisciplinary Introduction
(Vol. 1). Thousand Oaks CA: Sage.
Tottie, G. (1991). Negation in English speech and writing: A study in variation. San Diego CA:
Academic Press.
Tyler, A. (1995). Patterns of lexis: How much can repetition tell us about discourse coherence?
In J. Alatis, C. Straehle, B. Gallenberger & M. Ronkin (eds.), Linguistics and the Education
of Language Teachers: Ethnolinguistic, Psycholinguistic, and Sociolinguistic Aspects (George-
town University Round Table on Languages and Linguistics). Washington DC: Georgetown
University Press.
Upton, T. (2002). Understanding direct mail letters as a genre. International Journal of Corpus
Linguistics, 7(1), 6585.
Upton, T., & Connor, U. (2001). Using computerized corpus analysis to investigate the textlin-
guistic discourse moves of a genre. English for Specific Purposes: An International Journal,
20, 313329.
Ure, J. (1982). Introduction: Approaches to the study of register range. International Journal of
the Sociology of Language, 35, 523.
Van Dijk, T. (1980). Macrostructures: An Interdisciplinary Study of Global Structures in Discourse,
Interaction, and Cognition. Hilldale NJ: Erlbaum.
Van Dijk, T. (1981). Episodes as units of discourse analysis. In D. Tannen (ed.), Analyzing Dis-
course: Text and Talk (Georgetown Round Table on Languages and Linguistics). Washing-
ton DC: Georgetown University Press.
Van Dijk, T. (Ed.). (1997). Discourse as Structure and Process. Discourse Studies: A Multidiscipli-
nary Introduction (Vol. 1). Thousand Oaks CA: Sage.
Van Dijk, T., & Kintsch, W. (1983). Strategies of Discourse Comprehension. New York NY: Aca-
demic Press.
Varantola, K. (1984). On Noun Phrase Structures in Engineering English. Turku: University of
Turku.
Ventola, E. (1984). Orientation to social semiotics in foreign language teaching. Applied Linguis-
tics, 5, 275286.
Ventola, E., Charles, C., & Kaltenbacher, M. (2004). Perspectives on Multimodality. Amsterdam:
John Benjamins.
Wang, W. (Forthcoming). Newspaper commentaries on terrorism in China and Australia: A
contrastive genre study. In U. Connor, E. Nagelhout & W. Rozycki (eds.), Contrastive Rheto-
ric: Reaching to Intercultural Rhetoric. Amsterdam: John Benjamins.
Ward, G. (1990). The discourse functions of VP preposing. Language, 66, 742763.
Wells, G. (1999). Dialogic Inquiry: Towards a Sociocultural Practice and Theory of Education.
New York NY: CUP.
Williams, R. (1999). Results section of medical research articles: An analysis of rhetorical cate-
gories for pedagogical purpose. English for Specific Purposes, 18(4), 347366.
References
Wood, A. (1982). An examination of the rhetorical structures of authentic chemistry texts. Ap-
plied Linguistics, 3(2), 121143.
Youmans, G. (1991). A new tool for discourse analysis: The vocabulary management profile.
Language(67), 763789.
Youmans, G. (1994). The vocabulary management profile: Two stories by William Faulkner.
Empirical Studies of the Arts, 12(2), 113130.
Young, L. (1994). University lectures: Macro-structures and micro-features. In J. Flowerdew
(ed.), Academic Listening: Research Perspectives (pp. 159176). New York NY: CUP.
Zipf, G.K. (1949). Human Behavior and the Principle of Least Effort. Cambridge MA: Addison-
Wesley.
Index
A biology research articles 175ff. C

affective appeals as sequences of VBDUs 186 CARS 2528
see rhetorical appeals 189, 194207, 253257 classroom teaching (univer-
Aristotle 121123 abstract discussion sity) 213ff.
in 183184 as sequences of VBDUs 217
B cluster analysis of 211212 221, 232237
Baker 37 comparing top-down and corpus of 214
Bazerman 6, 7, 8 bottom-up analyses 242 extracting VBDUs from 214
Bhatia 6, 24, 32, 33, 43 257 215
biochemistry research arti- corpus of 176 functional interpretation of
cles 73ff. current state of knowledge VBDU types 230231
abstract discussion in 9394 in 181182 informational monologue
attributed knowledge discussion sections in 181, in 227228, 236237
in 9799 183184, 185, 188189, multi-dimensional analysis
coding moves in 8384 201203 of 215217
conceptual versus specific evaluation in 181 stance (personalized framing)
reference in 9193 extracting VBDUs from 176 in 225227, 234235
corpus of 75 178 VBDU types in 222229,
discussion section 8183, 86 factor analysis of 209210 230231
distribution of move introductions in 176177, 182, cluster analysis 171172,
types 8487 187188, 197199, 204205 190194, 222224
introductions 7778, 8586, MD description of research corpus-based approaches to
158159 article sections 184185 discourse analysis 1217,
linguistic analysis of 87ff. methods section in 182183, 240242
methods section 7879, 86, 185, 199201, 204205 advantages of 3640, 7475
159160 multi-dimensional (MD) bottom-up approaches 14,
move categories 7683 analysis of 178189, 1617, 155173, 241242
multi-dimensional (MD) 209210, 244249 comparison of approach-
analysis of 87119, procedural description in 183 es 239ff.
244249 reporting past events top-down approaches 13,
multi-dimensional profile of in 182183 1416, 2341, 241242
move types 101103 research journal styles 205 methodologies 1214
multi-dimensional compari- 207 corpus design 1719
son of move types 104116 VBDU types compared to Coupland 1
move categories 7683 move categories 249253 credibility appeals
move categories compared to VBDU types in 190194, see rhetorical appeals
VBDU types 249253 195ff.
presentation of current find- bottom-up approach to dis- D
ings 9799 course analysis discourse
results section 7981, 86 see corpus-based approaches definitions 12, 239240
stance in 9496 to discourse analysis socio-cultural approaches 2,
VBDUs in 15660 67, 239
see biology research articles
structure beyond the sen- Kwan 32 pathos

tence 12, 46, 910, 240 see rhetorical appeals
see language use L Perelman 123ff.
see corpus-based approaches language use 1, 34, 239 persuasion
to discourse analysis letters ethos, pathos, logos 122
fundraising 46ff., 121ff. see rhetorical appeals
E job applications 3031 prototypes
ethos linguistic analysis see genre
see rhetorical appeals of moves 3839
evaluation in discourse grammatical features used for R
see stance studies 267271 rational appeals
logos see rhetorical appeals
F see rhetorical appeals register 79
factor analysis 88, 264265 see genre
see multi-dimensional M reliability
analysis Martin 8 for coding moves 35, 84
Fox 9 Mauranen 7 research articles
fundraising discourse 43ff. modal verbs 72 introductions 2528
ICIC Fundraising Cor- in fundraising letters 64, move analysis of 2532
pus 4445 65, 66 see biochemistry research
fundraising letters 46ff., 121ff. move analysis 15, 23 articles,
affective appeals in 131132 compared to rhetorical ap- biology research articles
corpus-based analysis peals 141142 rhetorical appeals 16, 121ff.
of 5461 compared to VBDUs 243 affective appeals 125, 131132,
credibility appeals in 129131 244 136, 138141, 146
distribution of move corpus-based 3640, 8487 compared to moves 141142
types 5556 distribution of move corpus-based analysis
keywords in 137141, 148151 types 5556, 8487 of 132135
move analysis of 4654 inter-rater reliability 35, 68, credibility appeals 125,
prototypes 5861 8384 129131, 145146
rational appeals in 125129, linguistic analysis of definitions of 144146
147 moves 3839, 6368, 87ff. ethos, pathos, logos 122
rhetorical appeals in 125132, methodology 3235 in fundraising letters 121ff.
132135 of biochemistry research keywords in affective ap-
stance features in 6168 articles 7683 peals 138141
structural elements in 5253, of direct mail letters 4654 keywords in other ap-
56 of research article introduc- peals 148151
tions 2528 linguistic characteristics
G of other genres 2932
genre 23ff. of 136141
sequences of moves 3940 rational appeals 124, 125129,
compared to register 79 stance features in moves 63
prototypes 40, 5861 144145
68 Rhetorical Structure Theory 5,
genre analysis 2324 multi-dimensional (MD) analy- 6, 15
H sis 4, 261266 Rmer 2, 4, 10
Hamilton 1 methodology 8788, 263266
Hearst 161, 163 of moves in biochemistry S
Hunston 138 research articles 87119 Schiffrin 1, 3
Hyland 67, 123 of VBDUs 171 Scott 2, 138
of VBDUs in biology research stance
J articles 178189 grammatical devices for
Jaworski 1 of VBDUs in classroom stance 6972
teaching 215217 in biochemistry research
K articles 9496
keywords 138141 P
Partington 4
Index
in biology research arti- Thompson 3, 5, 10, 15 in biology research arti-

cles 181, 188189, top-down approach to discourse cles 176178, 194207
in classroom teaching 217, analysis in classroom teaching 214
225227 see corpus-based approaches 215
in fundraising letters 6168 to discourse analysis linguistic analysis of 169170
see rhetorical appeals (affec- Tribble 2 methodology for automatic
tive appeals) identification 161162
stance adverbials 6970 U multi-dimensional analysis
in fundraising letters 64, 67 unit of analysis 9, 11, 155156, of 171, 178189, 215217
stance verb + complement 243244 perceptual correlates
clause 7072 V of 163168
in fundraising letters 65 Vocabulary-based discourse types 170173, 190194,
Swales 15, 23, 24, 2528, 38 units (VBDUs) 222229
cluster analysis of 171172, VBDU types compared to
T article sections 192194
Tannen 1, 9 190194
TextTiling 161162 compared to moves 243244 Y
texts definition of 156 Youmans 6, 9
status of 2 exemplified in biochemistry
see unit of analysis research articles 156160
text types 171173 functional interpretation
of 230231
In the series Studies in Corpus Linguistics (SCL) the following titles have been published thus
far or are scheduled for publication:
29 Flowerdew, Lynne: Corpus-based Analyses of the ProblemSolution Pattern. A phraseological

approach. ix,173pp.+index. Expected November 2007
28 Biber, Douglas, Ulla Connor and Thomas A. Upton: Discourse on the Move. Using corpus analysis
to describe discourse structure. 2007. xii,289pp.
27 Schneider, Stefan: Reduced Parenthetical Clauses as Mitigators. A corpus study of spoken French,
Italian and Spanish. 2007. xiv,237pp.
26 Johansson, Stig: Seeing through Multilingual Corpora. On the use of corpora in contrastive studies.
2007. xxii,355pp.
25 Sinclair, John McH. and Anna Mauranen: Linear Unit Grammar. Integrating speech and writing.
2006. xxii,185pp.
24 del, Annelie: Metadiscourse in L1 and L2 English. 2006. x,243pp.
23 Biber, Douglas: University Language. A corpus-based study of spoken and written registers. 2006.
viii,261pp.
22 Scott, Mike and Christopher Tribble: Textual Patterns. Key words and corpus analysis in language
education. 2006. x,203pp.
21 Gavioli, Laura: Exploring Corpora for ESP Learning. 2005. xi,176pp.
20 Mahlberg, Michaela: English General Nouns. A corpus theoretical approach. 2005. x,206pp.
19 Tognini-Bonelli, Elena and Gabriella Del Lungo Camiciotti (eds.): Strategies in Academic
Discourse. 2005. xii,212pp.
18 Rmer, Ute: Progressives, Patterns, Pedagogy. A corpus-driven approach to English progressive forms,
functions, contexts and didactics. 2005. xiv+328pp.
17 Aston, Guy, Silvia Bernardini and Dominic Stewart (eds.): Corpora and Language Learners.
2004. vi,312pp.
16 Connor, Ulla and Thomas A. Upton (eds.): Discourse in the Professions. Perspectives from corpus
linguistics. 2004. vi,334pp.
15 Cresti, Emanuela and Massimo Moneglia (eds.): C-ORAL-ROM. Integrated Reference Corpora for
Spoken Romance Languages. 2005. xviii,304pp.(incl.DVD).
14 Nesselhauf, Nadja: Collocations in a Learner Corpus. 2005. xii,332pp.
13 Lindquist, Hans and Christian Mair (eds.): Corpus Approaches to Grammaticalization in English.
2004. xiv,265pp.
12 Sinclair, John McH. (ed.): How to Use Corpora in Language Teaching. 2004. viii,308pp.
11 Barnbrook, Geoff: Defining Language. A local grammar of definition sentences. 2002. xvi,281pp.
10 Aijmer, Karin: English Discourse Particles. Evidence from a corpus. 2002. xvi,299pp.
9 Reppen, Randi, Susan M. Fitzmaurice and Douglas Biber (eds.): Using Corpora to Explore
Linguistic Variation. 2002. xii,275pp.
8 Stenstrm, Anna-Brita, Gisle Andersen and Ingrid Kristine Hasund: Trends in Teenage Talk.
Corpus compilation, analysis and findings. 2002. xii,229pp.
7 Altenberg, Bengt and Sylviane Granger (eds.): Lexis in Contrast. Corpus-based approaches. 2002.
x,339pp.
6 Tognini-Bonelli, Elena: Corpus Linguistics at Work. 2001. xii,224pp.
5 Ghadessy, Mohsen, Alex Henry and Robert L. Roseberry (eds.): Small Corpus Studies and ELT.
Theory and practice. 2001. xxiv,420pp.
4 Hunston, Susan and Gill Francis: Pattern Grammar. A corpus-driven approach to the lexical
grammar of English. 2000. xiv,288pp.
3 Botley, Simon Philip and Tony McEnery (eds.): Corpus-based and Computational Approaches to
Discourse Anaphora. 2000. vi,258pp.
2 Partington, Alan: Patterns and Meanings. Using corpora for English language research and teaching.
1998. x,158pp.
1 Pearson, Jennifer: Terms in Context. 1998. xii,246pp.

Discourse On The Move

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Discourse On The Move

Hochgeladen von

Copyright:

Verfügbare Formate

Discourse on the Move

Studies in Corpus Linguistics (SCL)

General Editor Consulting Editor

John Benjamins Publishing Company

American National Standard for Information Sciences Permanence of

Library of Congress Cataloging-in-Publication Data

2007 John Benjamins B.V.

Part 1. Top-down analyses of discourse organization

4.2 Inter-rater reliability 35

3.4 The discussion section 81

Part 2. Bottom-up analyses of discourse organization

2.1.4 Cluster 4: Unmarked 229

Discourse analysis and corpus linguistics

1 Discourse and discourse analysis

1.1 Discourse studies of language use

discourse: for example, Aijmer (2002) on discourse particles, Collins (1991) on

1.2 Discourse studies of linguistic structure beyond the sentence

Computational perspectives: Computational studies of discourse organization have

1.3 Discourse studies of social practices and ideological assumptions

1.4 Register and genre perspectives on discourse

1.5 Identifying structural units in discourse

2 Corpus-based investigation of discourse structure

As summarized in the sections above, research on the linguistic characteristics of

aspects of discourse organization. On the other hand, most qualitative discourse

3 Top-down versus bottom-up corpus-based approaches

To achieve generalizable corpus-based descriptions of discourse structure, seven

Table 1.1 Top-down corpus-based analyses of discourse organization

Required step in the analysis Realization in this approach

1. Communicative/Functional Categories Develop the analytical framework: determine

Table 1.2 Bottom-up corpus-based analyses of discourse organization

Required step in the analysis Realization in this approach

1. Segmentation Segment each text in the corpus into discourse

3.1 Examples of top-down analyses of discourse

Several top-level discourse structure theories were advanced by text linguists in

3.2 Example of bottom-up approach

In contrast to the long research tradition applying top-down analyses of discourse,

approach has no such limitations, because it incorporates automatic computational

4 Creating a specialized corpus for discourse analysis

One of the central methodological issues for corpus-based research is to ensure

5 Overview of the book

Top-down analyses of discourse organization

Introduction to move analysis

WITH Budsaba Kanoksilapatham

In Chapter 1, we introduced two different approaches for using corpora to

2 Swales move analysis of research articles

Move 1: Establishing a territory

Move 2: Establishing a niche

Move 3: Occupying the niche

Step 1A Outlining purposes or

Step 1A, Outlining Purpose The aim of the present paper is to

Move 1: Establishing a territory (citations required) via Topic generalizations of

Move 2: Establishing a niche (citations possible) via:

Step 1A: Indicating a gap, or

Step 1: Announcing present research descriptively and/or purposively

3 Move analysis of research articles applied across genres

3.1 Description and examples

A brief description of a move analysis done on a corpus of job application let-

Move 5: Express pleasantries or appreciation at the end of the letter.

3.2 Summary of previous research on move analysis

4 Overview of the methods for move analysis

4.1 General steps of a move analysis

Kwan (2006) provides a useful introduction to the functional-semantic methods

Step 1: Determine rhetorical purposes of the genre