Beruflich Dokumente
Kultur Dokumente
Language Studies:
Stretching the Boundaries
Edited by
Copyright 2012 by Andrew Littlejohn and Sandhya Rao Mehta and contributors
All rights for this book reserved. No part of this book may be reproduced, stored in a retrieval system,
or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or
otherwise, without the prior permission of the copyright owner.
ISBN (10): 1-4438-3972-8, ISBN (13): 978-1-4438-3972-3
TABLE OF CONTENTS
vi
Table of Contents
CHAPTER SIX
THE ROLE OF FORENSIC LINGUISTICS
IN CRIME INVESTIGATION
ANNA DANIELEWICZ-BETZ
Abstract
This paper considers the extent to which forensic linguistics can be
considered a science, and outlines some ways in which it is useful in legal
proceedings, including voice identification, the interpretation of policesuspect interaction, verification of police reports (including the illegal
practice of verballing) and cross-cultural insights into speech patterns in a
courtroom context. The paper provides a closer examination of one
particular area, that of authorship attribution, particularly in SMS
messages, and concludes by raising some ongoing controversies in forensic
linguistics and by discussing future prospects.
94
Chapter Six
Is This A Science?
The primary difference between forensic and non-forensic methods in
linguistics is the scientific approach. In forensic linguistics, the scientific
method requires hypothesis testing and a litigation-independent testing of
the method for its accuracy. These tests are performed with robust controls
regarding data quantity, data sources, and analytical objectivity.
Restrictions in applying linguistic expertise in the context of law are
due to varying degrees of acceptability in the courtroom, varying degrees
of reliability related to shortcomings such as the brevity of documents,
small data samples, general characteristics of language (for example,
generic language features of suspects), and the intrinsic nature of language
as something in constant change. The quality of evidence from this
emerging field also depends considerably on the experience and
knowledge of individual linguists involved in a given case. Courts in many
countries admit forensic evidence but have differing criteria. In the United
States, for example, the so-called Daubert standard rule of evidence
regarding the admissibility of expert witnesses testimony in federal legal
proceedings states that evidence based on innovative or unusual scientific
knowledge may only be admitted after it has been established that it is
reliable and scientifically valid. The Daubert test is based on peer review,
error rates, testing, and acceptability in the relevant scientific community.
95
96
Chapter Six
The police usually lack the authority to make promises such as Well
go easy on you if you confess, yet this is implied in their requests to
comply. The problem is, as Solan and Tiersma (2005, p. 38) point out, that
people who are stopped by the police tend to interpret ostensible requests
as commands or orders, yet, in contrast, their own indirect wishes to get a
lawyer often go unnoticed (for example, Maybe I should talk to a
lawyer). This problem is further exacerbated due to problems related to
the comprehensibility of the Miranda warning and other police language
for many suspects, including defendants who may be (semi-)illiterate,
speakers of another language, or too young or mentally-challenged to
understand their rights to remain silent and seek legal advice.
In any case, the asymmetric nature of the relationship between
authority figures (the police) and the defendantwho may be disadvantaged
97
98
Chapter Six
Authorship Attribution
Authorship attribution is the science of inferring characteristics of the
author from the characteristics of documents produced by that author. The
key task is to establish who said or wrote something which is to be used as
evidence. Attribution is facilitated by measuring word length average,
average number of syllables per word, article/determiner frequency, and
type-token ratio (a measure of lexical variety). Furthermore, punctuation
99
Chapter Six
100
gr8t!
r u goin?
101
102
Chapter Six
information as well (cf. Clement and Sharp 2003; Mikros and Argiri,
2007). It seems that low-level features like character N-grams
(subsequences of n items from a given sequence, for example, phonemes,
syllables, letters, or words) can successfully be applied in stylistic text
analysis (cf. Keselj et al. 2003; Stamatatos 2006; Grieve 2007). A crucial
need is, however, to increase the available benchmark corpora so that they
cover many natural languages and text domains. It is also very important
for the evaluation corpora to offer control over genre, topic and
demographic criteria.
Speaker Identification
One of the controversies discussed in, for example, Hollien (2001), is
the disagreement in the so-called scientific community on the degree of
accuracy with which examiners can identify speakers under all conditions.
Surprisingly, many suspects will voluntarily give a sample of their voice
103
Testimony
Controversies also arise in relation to witness/police testimony. All the
cases of second-hand verbal (apparently verbatim) material (cf. I dont
know exactly what he said, but I know he said he did it in Solan and
Tiersma, 2005: 98) can be considered unreliable since, as discussed below,
human memory is incapable of retaining the exact wording even after a
couple of seconds, not to speak of months or years. Moreover, reproduced
utterances may be presented in isolation, lacking the original paralinguistic
and situational (pragmatic) context. There also remains a great deal of
research to be done to increase our insight into the effect of estimator
variables on speaker identification by ear witnesses. It should for the time
be treated with considerable caution.
Scientific criteria for court admissibility of testimony still pose a
problem as they differ from country to country and from state to state (as
in the case of the US). Required qualifications of examiners and presenters
of forensic linguistic materialso-called forensic expertshave not yet
been clearly specified, either.
104
Chapter Six
105
forensic setting (ransom notes, black mail, etc.) are usually much too short
to make a reliable identification. Moreover, which linguistic features are
reliable indicators of authorship, and how reliable those features are,
remains to be discovered. As Tiersma (ibid) points out, research is
ongoing, and the availability of large corpora of speech and writing
samples suggests that the field may advance in the future (although the
typically small size of the documents in most criminal cases will always
be a problem).
It is therefore crucial for the attribution methods to be robust and
applicable to a limited amount of short texts. However, several important
questions remain open in relation to the authorship attribution, the most
important issue being the required text-length. Despite the fact that various
studies have reported promising results with short texts (with less than
1,000 words; cf. Sanderson and Guenter, 2006; Hirst and Feguina, 2007),
it has not yet been possible to define a text-length threshold for reliable
authorship attribution.
In the final section of this paper, I want to turn to some of the future
challenges for forensic linguistics and possible ways towards scientific
legitimisation of the discipline.
106
Chapter Six
107
Bibliography
Amos, O. 2008. The text trap. The Northern Echo. Retrieved January 5,
2012
from http://www.thenorthernecho.co.uk/features/leader/207
6811.the_text_trap/
Chaski, C.E. 2005. Empirical evaluations of language-based author
identification techniques. International Journal of Speech, Language
and the Law, 8 (1), pp. 1-65.
. 2005. Whos at the keyboard? Authorship attribution in digital
evidence investigations. International Journal of Digital Evidence 4
(1), pp. 1-13.
Chaski, C. E., and H. J. Chmelynski. 2005a (pending publication). Testing
twenty variables for author attribution by discriminant function
analysis.
Chaski, C. E., and H. J. Chmelynski. 2005b (pending publication). Testing
twenty variables for author attribution by logistic regression.
Clement, R., and D. Sharp. 2003. N-gram and Bayesian classification of
documents for topic and authorship. Literary and Linguistic
Computing, 18 (4), 423-447.
Clifford, B.R. 2009. The role of the expert witness. In G. Davies, R. Bull
and C. Hollin (eds.). Forensic Psychology. New York: Wiley.
Eades, D. 2008. Courtroom talk and neocolonial control. Berlin and New
York: Mouton de Gruyter.
. 2000. I dont think its an answer to the question: Silencing aboriginal
witnesses in court. Language in Society, 2000 (29), pp. 161-195.
Grieve, J. 2007. Quantitative authorship attribution: An evaluation of
techniques. Literary and Linguistic Computing, 22 (3), pp. 251-270.
Hirst, G. and O. Feiguina. 2007. Bigrams of syntactic labels for authorship
discrimination of short texts. Literary and Linguistic Computing, 22
(4), pp. 405-417.
Hollien, H. 2001. Forensic Voice Identification. London: Academic Press.
Keselj, V., F. Peng, N. Cercone, and C. Thomas. 2003. N-gram-based
author profiles for authorship attribution. Proceedings of the Pacific
Association for Computational Linguistics, pp. 255-264.
Kniffka, H. 2007. Working in Language and Law: A German Perspective,
Basingstoke: Palgrave Macmillan.
Kredens, K. 2000. Forensic linguistics and the status of linguistic evidence
in the legal setting. Unpublished Ph.D. dissertation. University of
Ldz.
Leech, G. 1983. Principles of Pragmatics. London: Longman.
108
Chapter Six