Beruflich Dokumente
Kultur Dokumente
The relationship between language and identity has been explored in a number
of ways in applied linguistics, and this article focuses on a particular aspect of
it: self-representation in the oral history interview. People from a wide range
of backgrounds, currently resident in one large city in England, were asked
to reflect on their lives as part of a project to celebrate the millennium, resulting
in a corpus of 144 transcribed interviews. The article considers the utility of
realist social theory and complexity theory in the analysis of patterns—and
deviations from those patterns—in both the linguistic features of these inter-
views and the social categories to which people are routinely ascribed. Corpus
linguistic software was used to identify discourse features of the corpus as a
whole, and to compare and contrast features produced by different speakers
with reference to the conventional social categories used in quantitative
research. These categories, with their homogenizing limitations, are challenged
with reference to complex causation. The article uses the category of gender
to exemplify the multi-method approach advocated.
INTRODUCTION
This article is concerned with probabilistic patterns, including deviations
from them, in social behaviour in general and in language use in particular.
The approach conceives of social and linguistic processes as complex, agency-
driven, and susceptible to changing kinds of analysis as computer technology
develops. The data used to illustrate the discussion are a corpus of 1.8 million
words of transcribed speech, comprising 144 oral history interviews. As exam-
ples of discourse, they constitute a rich resource for exploring the probabilis-
tic—but not determined—linguistic patterns to be found in a set of texts
that have much in common with each other but that are each nevertheless
unique; each interviewee demonstrates the ever-present potential for linguis-
tic creativity while simultaneously contributing to the collective entity
that emerges as ‘the discourse of life histories’. Located in between the com-
posite identity of the whole set and the unique identity of the individual inter-
view are some patterns associated with the speakers’ membership of various
sub-categories, identifiable from both social and linguistic analysis.
216 PROBABILITIES AND SURPRISES
The original aim of the project within which these interviews were gener-
ated was not specifically to contribute to linguistic research. Rather, it was
to record, from a socio-cultural perspective, the accounts of a diverse range
of residents of the large English city of Birmingham, in order to recognize
and celebrate the experiences of its citizens. From this angle, the corpus
represents a potential means of understanding complex social processes,
where, again, each individual is a social actor contextualized in a complex
web of structured social relations. As Uprichard and Byrne (2006: 668) put
it, ‘narratives are descriptions not of single systems but of the interweaving
The data
The interviews which comprise the data in this study were recorded in 2000–
2001 by two oral historians as part of the ‘Millennibrum’ Project and deposited
in the Local Studies and History section of Birmingham Central Library.
A. SEALEY 217
The aim was to preserve the narrative accounts of a diverse range of residents
of the city at the turn of the millennium, for local people to participate
‘in presenting and recording their experiences, beliefs, contributions to the
community and hopes for the future’ (Dick 2002). Of these 150 interviews,
each lasting up to 90 minutes, 144 have been released for further research in
accordance with the ethical consent procedures employed by the library. For
each interviewee, information is available about place of birth, age, sex, occu-
pation, level of education, marital status, and religious affiliation—all topics
which are usually covered in the interviews themselves.
road, at_the_time, worked, at, quite, city. The semantic fields connoted by the
more content-carrying of these items is fairly self-evident, and accords with
the topics chosen for this project, but another way of representing them is
using the semantic ‘domain clouds’ available in WMatrix (Figure 1).
The WMatrix tool can be used to illustrate the relative frequency differences
between the Millennibrum corpus and a reference corpus in a similar
manner to the ‘tag clouds’ employed in some social networking web sites.
In those, ‘an alphabetically sorted list of words (confusingly for this context
called tags) are shown in a larger font if they are (manually) assigned more
frequently to shared digital photographs . . . or web site bookmarks . . .’ (Rayson
2008b: 533). WMatrix incorporates the USAS tagger (Rayson et al. 2004, cited
in Rayson 2008b), which automatically assigns semantic fields (domains) to
each word or multiword expression in a corpus. The clouds it produces
use larger fonts to indicate greater keyness, so that, in this case, the semantic
domains of ‘moving,_coming_and_going’ and ‘personal_relationship:_
general’, among others, are shown to be significantly more frequent in this
corpus than in the spoken component of the BNC.
From each of these sets of findings, we can conclude that there is
a commonality about these texts, despite the fact that they represent 144
different and, in some ways, strikingly contrasting life stories. Among these
interviewees are individuals who are young and old (their years of birth span
1896–1985), migrants from across the world and ‘Brummies born-and-bred’,
employees from all the major occupation categories of the SOC2000 classifica-
tions (ONC 2000), conventional heterosexual family members and people
220 PROBABILITIES AND SURPRISES
with ‘alternative’ lifestyles—and yet all have produced a text in this context
which contributes to a patterned discourse with identifiable features.
to speakers of English, was actually drawn on. From within that stock of
resources, a further sub-set accounts for all of what was produced, and this
can be partially explained with reference to insights from psycholinguistics
(e.g. scripts, schemas, and the ‘economic’ advantages of processing expected,
rather than novel, input and output; e.g. Rumelhart 1975; 1984) and prag-
matics (e.g. the co-operation principle (Grice 1975) and theories of relevance
(Sperber and Wilson 1986)); also helpful are sociologically derived under-
standings about sedimented patterns of social behaviour, with their parallels
in corpus linguistic findings about ‘discourse patterns’ that cluster around
APPROACHES TO ANALYSIS
By means of corpus analysis, it is possible to identify both similarities and
differences between the whole set of interviews and each individual example,
and also to find patterns among sub-sets of the corpus as a whole. In the
tradition of quantitative, variationist analysis, the conventional approach
involves predetermining which demographic ‘variables’ are likely to correlate
with the differential use of particular linguistic features, and to use some kind
222 PROBABILITIES AND SURPRISES
that are the products of the generative real system, and the inte-
rior working of the system is not reducible to elements existing
separate and analyzable outside the system. (Byrne 2001: 64)
It is appropriate at this point to summarize the methodological implications
of the social realist theory within which the present study is situated. From
Pawson’s work (which is predominantly concerned with the evaluation of
interventions in social policy), an obvious priority is the identification of con-
texts and mechanisms to explain stability and variation in outcomes. The
interpretations of experience by social actors may be consistent or varied,
described here aims to explore how far the analysis of a corpus of transcribed
oral histories can be enhanced by the application of the integrative method-
ological approaches advocated by social realism within a corpus linguistic
framework.
use of this word, then no men would use it often, and all the women would.
This is patently absurd, but we can look somewhat differently at this
correlation.
Counter-posed to the implicit determinism of the variables-based approach
are the more interpretive research traditions. Instead of inferring that posses-
sion of the attribute [variable] female leads to an increased use of items such
as lovely, some analysts would stress the performative dimension of gender
through language (e.g. Ochs 1992; Meyerhoff 1996; Holmes and Meyerhoff
1999; Weatherall 2002). Holmes (1997: 203), for example, suggests that atten-
with the corpus as a whole, using the semantic domains tool described
above. This allows the user to see in a concordance line view each of the
items assigned to the semantic domains identified by the tagger. From this,
it is apparent that some of these speakers seem to be very positive in their
narratives, using items classified in the ‘evaluation_good’ category and in the
‘happy’ category significantly more frequently than is found in the corpus
as a whole. Some examples from the concordance lines for these data are
given below (see Figure 2).
This suggests that the interplay between these women’s oral histories and
(c) to accept it. And my mom who’s fab was like I don’t care
e her and she loves you. She was fabulous but found it
modation we wanted, that was the best route, doing that.
award and I won which was really fabulous, probably one
probably one of the best moments of my life which was
been to get there really. It was fab .
(a)
I don’t like the current English culture
I don’t like housework very much. I don’t like ironing, I don’t like washing
I’m very direct and I don’t like to hide because I’m not embarrassed about who I am.
I’ll move away from Birmingham anyway, just because I don’t like the city.
I like open spaces but I don’t like the sea
Birmingham is a very, and I don’t like to use the word, multi-cultural environment.
I don’t like that my disability has [been?] made the subject of competitive spirit
I don’t like fighting or anything. I don’t mind drinking but I don’t like fighting, so that’s it.
I thought I am not going back to this, so I thought, no, I don’t like this
I don’t like to mix much, you know
(b)
I’ve never visited the country and I really think that I don’t want to because I’ve seen too much of a bad side
So, when he came I said I don’t believe in unions so I don’t want to join. Well most of the others had joined
fair enough, people do want to get married, but personally I don’t want to waste all that money on getting married
Figure 3: (a) Concordance lines for ‘I don’t like’ from the interviews of ten
different speakers. (b) Concordance lines for ‘I don’t want to’ from the
interviews of ten different speakers
In these brief extracts, I suggest, we gain some brief glimpses into what
Archer (2000: 163) identifies as ‘one of the most important things to probe’,
namely ‘how the self-conscious human being reflects upon his or her invol-
untary placement’. A realist, agency-based perspective has no problem
accounting for this complex configuration of findings as an example of
people making choices from the resources and options available to them—
and in a patterned way that does not necessarily conform to the rather inflex-
ible categories conventionally deployed in quantitative studies. It is more con-
sistent with the ‘performance of gender’ perspective familiar from qualitative
research, but allows for different levels of analytic purchase on the data.
CONCLUSION
In this article I have sketched briefly the ways in which social researchers
seek to account for probabilistic patterns with reference to realist social
theory. This theory has several important features. It conceives of the social
world as an open, complex, dynamic set of inter-related systems. Human
behaviour is understood to be explicable not with reference to single causes
that are effective in categorical ways, but with reference instead to nested
and interacting sets of interests and circumstances, some of them involuntary
and perhaps even unknown to the people affected by them, others the results
of choices made with reference to what people perceive as in their interests.
In this approach, social category labels are seen as neither discursive ephemera
nor deterministic causal variables. Many of the categories routinely used in
survey research, policy evaluations and monitoring practices are less stable
than they may seem at first sight (Sealey and Carter 2001; Carter and Sealey
in press). The social category used as an illustration here was that of ‘gender’.
Once assumed to be an essential, deterministic attribute firmly linked to
232 PROBABILITIES AND SURPRISES
(c) I always had a sense that I was different from everyone else from
asked me to ask you why you’re different to us. And I think that
colour. I knew I was different and at home, our life at home wa
communicated with her family was very different to the way that I’d
ifferent, you know language was different and even then, I mean I
but I knew things were different in the way that they acted within
great deals of class differences and certain children doing better
you get compared to the other and that might have given her a b
university, that makes you different and you’re not the same person a
and so I was quite horrified at other people’s personal habits, I thi
r in’92, I did n’t realise how segregated the city was and did n’t
Figure 4: (a) Selected concordance lines for semantic domain of ‘not part of a
group’ in one interviewee’s transcript. (b) Selected concordance lines for
semantic domain of ‘not part of a group’ in a second interviewee’s transcript.
(c) Selected concordance lines for semantic domain of ‘comparing_different’ in
a third interviewee’s transcript
we reflect upon nature and upon practice. Without such referential reality
there would be nothing substantive to reflect upon; but without our
reflections we would have only a physical impact upon reality.’
Contemporary research that is consistent with this theoretical outlook
makes use of various methods, including the ever-increasing processing
capacity of computers. The methods used are iterative, applying different
scales of analysis, as is consistent with a view of the social world as comprising
different kinds of entities with different properties and powers that operate
on different timescales.
ACKNOWLEDGEMENTS
I am grateful to Sian Roberts, Helen Lloyd, and Malcolm Dick for access to, and information about,
the Millennibrum interviews, as well as to the people who recorded them for posterity. I should
also like to thank Pernilla Danielsson, Paul Rayson, and the Collaborative Research Network
at the University of Birmingham which funded the post-editing of the transcripts, most ably
carried out by Juliet Herring. Grateful acknowledgements are also due to Bob Carter, three anon-
ymous referees and the editors of the journal for helpful comments on an earlier version of
this article.
REFERENCES
Archer, M. 2000. Being Human: The Problem de Beaugrande, R. 1997. New Foundations
of Agency. Cambridge University Press. for a Science of Text and Discourse: Cognition,
234 PROBABILITIES AND SURPRISES
Communication, and the Freedom of Access to Holmes, J. and M. Meyerhoff. 1999. ‘The com-
Knowledge and Society. Ablex. munity of practice: theories and methodologies
de Beaugrande, R. 1999. ‘Linguistics, sociolin- in language and gender research,’ Language in
guistics, and corpus linguistics: ideal language Society 28: 173–183.
versus real language,’ Journal of Sociolinguistics Hunston, S. 2002. Corpora in Applied Linguistics.
3/1: 128–139. Cambridge University Press.
Bischoping, K. 1993. ‘Gender differences in Hunston, S. and G. Francis. 2000. Pattern
conversation topics, 1922–1990,’ Sex Roles Grammar: A Corpus-Driven Approach to the
28/1–2: 1–18. Lexical Grammar of English. John Benjamins.
Byrne, D. 2001. ‘What is complexity science? Kipers, P. S. 1987. ‘Gender and topic,’ Language
Thinking as a realist about measurement in Society 16: 543–557.
Rayson, P. 2008b. ‘From key words to key Sealey, A. and B. Carter. 2004. Applied
semantic domains,’ International Journal of Linguistics as Social Science. Continuum.
Corpus Linguistics 13/4: 519–550. Sinclair, J. 1991. Corpus, Concordance, Collocation.
Rayson, P., D. Archer, S. L. Piao and Oxford University Press.
T. McEnery. 2004. The UCREL semantic Sinclair, J. M. 2004. Trust the Text: Language,
analysis system. In Proceedings of the Workshop Corpus and Discourse. Routledge.
on Beyond Named Entity Recognition Semantic Sperber, D. and D. Wilson. 1986. Relevance.
Labelling for NLP Tasks in association with 4th Blackwell.
International Conference on Language Resources Stubbs, M. 1996. Text and Corpus Analysis.
and Evaluation (LREC 2004), 25th May 2004, Blackwell.
European Language Resources Association, Stubbs, M. 2001. ‘Texts, corpora and problems