Forensic Phonetics

Forensic Phonetics:
Issues in speaker identification evidence
Andrew Butcher
Centre for Human Communication Research

Flinders Medical Research Institute
Flinders University, Adelaide, Australia
Abstract
The field of forensic phonetics has developed over the last 20 years or so and embraces a number
of areas involving analysis of the recorded human voice. The area in which expert opinion is
most frequently sought is that of speaker identification – the question of whether two or more
recordings of speech (from suspect and perpetrator) are from the same speaker. Automated
analysis (in which Australia is a world leader) is only possible where recording conditions are
identical. In the most frequently encountered real-world forensic situation, comparison is
required between a police interview recording and recordings made via telephone intercepts or
listening devices. This necessitates a complex procedure, involving auditory and acoustic
comparison of both linguistic and non-linguistic features of the speech samples in order to build
up a profile of the speaker. The most commonly used measures are average fundamental
frequency and the first and second formant frequencies of vowels. Much work is still needed to
develop appropriate statistical procedures for the evaluation of phonetic evidence. This means
estimating the probability of finding the observed differences between samples from the same
speaker and the probability of finding those same differences between samples from two
different speakers. Thus there needs to be an acceptance that the outcome will not be an absolute
identification or exclusion of the suspect. By itself, your voice is not a complete giveaway.
1. The field of forensic phonetics

The use of phonetics as a forensic tool has developed over the past 20 years or so (Hollien 1990;
Baldwin & French 1991), but with the rapid expansion in the number of cases depending on the
evidence of covert audio and video recordings in recent years, forensic phonetics now plays a
crucial role in an increasing number of criminal trials. A forensic phonetician may be asked to
prepare reports in a number of areas, of which the following four are the most frequently
encountered:
1.1 Speaker identification. This is by far the most commonly required task and the subject of the
remainder of this paper.
1.2 Disputed utterances. In view of the usually very poor quality of covert police recordings
(especially those made via a listening device), there is often ample scope for a defendant to
Forensic Phonetics Butcher
challenge the prosecution’s version of what was actually said in the course of a recorded
conversation. Forensic phoneticians may be asked to prepare a report on the quality of the
recording and the intelligibility of the speech. They may also be asked to prepare an ‘objective’
transcript of the recording.
1.3 Tape authentication. Occasionally a defendant (or a civil litigant) may have cause to question
whether an audio recording has been tampered with in some way. Usually the claim is that
certain sections have been excised or perhaps transposed. It is not generally within the
competence of a phonetician to give an opinion as to the physical condition of a tape, but there
may be evidence within the acoustic signal (‘pops’ or abrupt changes in either the signal itself or
the background noise) which would be indicative of electronic editing. However, currently
available software makes ‘seamless’ editing comparatively easy, and a phonetician may be
needed to give an opinion on the only remaining evidence of any tampering – linguistic evidence
in the form of unnatural changes in rhythm, tempo or intonation.
1.4 Voice line-ups. The practice of confronting witnesses of a crime with a tape recorded ‘voice
line-up’, where the voice of a suspect is included amongst a series of ‘foils’, may be used to
obtain evidence of identification in cases where, in the course of committing a crime, an unseen
or masked perpetrator has spoken in the presence of the witnesses. This recording is played to
the witness(es) and they are asked to state whether they can identify any of the voices as that of
the perpetrator. In order to be entirely fair to the suspect, there are a number of criteria which
need to be observed (Broeders & Rietveld 1995; Hollien, Huntley, Künzel & Hollien P 1995). As
with visual identification parades, it is a general principle of fairness in the conducting of voice
line-ups is that there should be no feature of any of the voices or the recordings which would
cause non-witnesses to pick out a particular speaker (whether suspect or foil) as being different
from the rest. A phonetician may be consulted on aspects of the construction of the tape and the
administration of the confrontation.
2. Speaker Identification: analysis and measurement

I would estimate that at least 90% of my work as a forensic phonetician is concerned with the
identity of speakers in audio recordings. There is a good deal of misunderstanding surrounding
the capabilities of speech technology in this area. Some of this misunderstanding dates from the
1960’s, when the “Voiceprint” technique became a favourite tool of certain police forces, most
2
notably in the USA. This methodology, which involved the visual inspection and impressionistic
comparison of sound spectrograms, was regarded sceptically by the scientific community at the
time, and has since been entirely discredited (Hollien 1990, 2002; Gruba & Poza 1995). The
term “Voiceprint” suggests that the technique is analogous to forensic techniques such as
fingerprinting or DNA analysis. There are a number of reasons why this is an inappropriate
analogy. Firstly, there is no single feature of the voice which is unique to every speaker. Unlike
the vanishingly small possibility in the case of fingerprints or DNA molecules, it is quite
possible for two speakers to be, for all practical purposes, identical in some respect. Secondly,
most (if not all) of the features of the voice which are measurable in recordings of the quality
typically encountered in the forensic context are capable of being consciously changed by the
speaker. These include, voice pitch, aspects of voice quality, consonantal articulation, and vowel
quality. At present it is not impossible for a skilled mimic to defeat the forensic voice
identification procedure. Thirdly, for most of the voice features, we do not have sufficient data
on the normal population to know what the chances are of two speakers being similar or identical
with respect to that feature. Finally, acoustic parameters vary as a consequence of differences in
recording conditions as well as of differences in the voice itself. Australia leads the world in the
technology of automatic speaker recognition (in 2001 a team from the RCSAVT Speech
Research Lab at Queensland University of Technology won two of the categories for single
speaker detection tasks in the National Institute of Standards & Technology’s benchmark tests
on speaker recognition), but automatic speaker recognition is not yet able to separate out
variation due to speaker differences from variation due to recording conditions (and it is doubtful
whether it will ever be able to). Thus automatic speaker recognition techniques are of limited use
in the typical forensic situation, where a voice recorded over the telephone or via a listening
device is to be compared with a voice recorded in a police interview room. The intervention of a
phonetically and linguistically qualified human operator is required. The main components of the
procedure are an auditory analysis and an acoustic analysis, each of which in turn has a number
of component parts. Voice ID is therefore more appropriately compared with a technique such as
a ‘photo-fit’ type of procedure, where a number of features are considered as part of an overall
profile.
2.1. Auditory analysis
3
This part of the analysis involves careful and repeated listening by the expert, noting features of
the voices in question under four basic headings. Firstly, voice quality features are ascertained.
This means describing ‘voice’ in the technical sense – i.e. the sound made by the vibration of the
vocal folds – and ignoring for the moment any variations contributed by the resonances of the
throat, mouth and nasal passages above. It can be done using one of a number of descriptive
frameworks (e.g. Isshiki & Takeuchi 1970; Laver 1980; Wendler, Rauhut & Krüger 1986; Oates
& Russell 1998), whereby aspects of the voice can be quantified according to parameters such as
‘roughness’, ‘strain’, ‘creakiness’, ‘breathiness’ and so on – terms which are meaningful to other
phoneticians and speech scientists and which describe in as accurate and objective way as
possible the auditory impressions of the listener. Secondly, the investigator attends to the non-
linguistic characteristics of the speech which are not produced by the larynx. This means
listening to the effects of the long-term setting of the throat, the tongue and lips and the
resonances of the nasal passages and sinuses. This is known as the articulatory setting, and here
too, established descriptive frameworks are available (Laver 1980; Esling 1994) which rate the
voice according to such parameters as ‘hypernasality’, ‘pharyngealisation’, ‘labialisation’, as
well as vertical position of the larynx. The third set of parameters relate to aspects of (mainly
vowel) articulation which provide clues to the speaker’s geographical and social background. In
long-established linguistic communities such as in the United Kingdom and Europe, this part of
the analysis can provide very useful information. In a recently established community such as
(non-Aboriginal) Australia, the information which can be gleaned is usually quite scanty.
Australian English accents are traditionally classified on a three-point scale as being ‘Broad’,
‘General’ or ‘Cultivated’ (Mitchell & Delbridge 1965), but there are very few features which
enable us to pinpoint the speaker’s geographical origins with any accuracy. One or two
pronunciations are peculiar to Queensland and another one or two distinguish speakers with a
South Australian background. A more recent phenomenon is the “pan-ethnic” accent (sometimes
known as “wogspeak”) which has developed among second- and subsequent-generation
Australians of non-English-speaking background (Warren 1999). The final component of the
auditory analysis is the identification of any idiosyncratic pronunciation features which may be
present. The more commonly occurring idiosyncrasies involve the articulation of consonants,
and include various types of ‘lisp’, the labialising of ‘r’ (‘rabbit’ becomes something likes
4
‘wabbit’) and the pronunciation of ‘th’ as ‘v’. Apart from this, speakers may exhibit various
kinds of dysfluency, including stuttering, ‘cluttering’ and slurring of words.
2.2 Acoustic analysis

In order to carry out an acoustic analysis the recording must be digitised to a computer hard
drive or compact disc (a sampling rate of 22.05 kHz and a 16-bit resolution are normally used).
The recordings are usually edited so as to contain only the voice of the speaker under
investigation. Published recommended minimal sample sizes for forensic speaker comparison
range from 15 s to 120 s. With regard to fundamental frequency measurement (F0), one recent
review of the forensic phonetic literature concludes: “If the communicative behaviour may be
considered ‘normal’, 15-20 sec of speech will be sufficient to calculate speaker F0” (Braun
1995). The analyses described below can be performed using any one of a number of currently
available speech analysis software packages.
2.2.1 Fundamental frequency

The rate of vibration of the vocal folds during voiced segments of speech is what the listener
perceives as the pitch of the voice. This is known as the fundamental frequency, and is
measured in cycles per second or ‘Hertz’ (Hz). Obviously this is capable of variation by the
speaker, and indeed this is one of the main ways of conveying both grammatical and emotional
meaning in speech.
5
Figure 1: Waveform (above) and pitch contour (below) of the utterance “We went to Woolloomooloo”
Figure 1 shows a waveform and pitch contour for an Australian English sentence. The waveform
at the top represents the tiny variations in air pressure caused by the transmission of the sound
waves. The bottom trace shows the variation in frequency of those vibrations over time: the
fundamental frequency. Each speaker has a particular range of fundamental frequency which
s/he habitually uses and within which s/he feels most comfortable and this is an important
measure for forensic purposes, because it is one of the few measures for which we know the
distribution amongst the adult population at large. The average speaking fundamental frequency
for an adult caucasian male is 113 Hz, and 50% of the male population lie somewhere between
100 to 130 Hz in spontaneous speech (Kuenzel 1989). The corresponding average for females is
225 Hz. Figure 2 shows how this measure may be used in building up a voice profile. In this case
the voice of a person issuing a ransom demand over the telephone is compared with the voices of
two suspects (Butcher & Moody 1999). Clearly the fundamental frequency of suspect 1 is much
closer to that of the perpetrator than is the fundamental frequency of suspect 2. Furthermore both
the perpetrator and suspect 1 differ markedly from the population mean and in the same
direction.
6
240 perpetrator suspect 1 suspect 2

frequency (Hz) →
220
200
mean fundamental frequency (Hz)
180
160
mean fundamental
140
120
100
80
60
m
'
'I'
'
'I'
'
'I'
'D
'D
co
co
1'
nt
1
rp
rp
nt
on
on
Sa
pe
1
nt
Sa
pe
on
R
Sa
R
Figure 2:Graph of mean fundamental frequency of three speakers in a number of recordings. The vertical lines
represent one standard deviation either side of the mean. The dashed line represents the mean for the
adult male population.
2.2.2. Long term average spectrum

A spectrum is a plot of energy against frequency. It shows the distribution of energy throughout
the frequency range during a very small time ‘slice’ of sound. A long-term spectral energy
profile is derived by averaging a large number of spectral slices over a longer sample of speech,
thus eliminating information on the details of individual sounds. This is, in theory, the best
measure of what we perceive as voice quality and vocal effort, as well as the overall effects of
long-term articulatory settings. It is this kind of measure which is used in most automated
speaker recognition procedures (Butcher & Moody 1999). Unfortunately, such measures also
reflect differences in recording conditions, and often these may be sufficiently large to mask any
similarities between speakers. Figure 3 illustrates this problem. In Figure 3a the voice of a
suspect recorded via a mobile telephone is compared with unknown voices from four other calls
made from the same phone. Clearly there is a high degree of similarity between the voices. In
Figure 3b, however, the voice of the suspect is shown under three different conditions: recorded
on standard audio cassette via telephone, recorded in free field on microcassette and recorded in
free field on VHS-C cassette. In this case the three spectra look quite different – in particular the
there is a large discrepancy between the telephone recording and the two free-field recordings,
7
which represent the two recording conditions most commonly offered for comparison in the
forensic situation. Clearly this measure can only be used in the limited number of situations
where the conditions under which recordings have been made can be assumed to be similar.
energy (dB) →
(a) (b) frequency (Hz) →

Figure 3:Long-term average spectra (a) from 5 separate phone calls, allegedly by a single speaker and (b) from the
same speaker, recorded under three different conditions
2.2.3 Vowel formant frequencies

When a speaker pronounces a vowel sound, a number of resonances are produced in the vocal
tract (the tube formed by the mouth and throat cavities). These are known as formants. The
frequencies of the lowest two or three formants change according to the ‘colour’, ‘quality’ or
‘timbre’ of the vowel. Formants can be measured from a sound spectrogram, which is a kind of
three-dimensional spectrum. As with the spectrum, the distribution of energy is shown over the
frequency range, but in this case we can see how this distribution varies as a function of time.
Frequency is shown on the vertical axis and time on the vertical axis, whilst the amount of
energy present is represented by the darkness of the shading. The formants appear as dark
horizontal bands, whose vertical position varies according to the nature of the vowel. This is
illustrated in Figure 4. For example, if a number of speakers pronounce the short ‘a’ vowel in
words such as ‘cat’, ‘bad’, ‘sack’ etc, one might expect to find some small, but consistent
differences between speakers, if the sample is large enough, and likewise for each of the other
vowels of the language.
8
head had hard

4.5 kHz
frequency →
F3 F3 F3
F2
F2
F2
F1
F1 F1
0 time → 1.775 s
Figure 4: A sound spectrogram of the words ‘head, had, hard’, spoken by an adult male. The dark horizontal bands
(F1, F2, F3) in the vowels represent areas of higher energy known as FORMANTS.
A useful way of summarising vowel formant frequency data from a given speaker is to plot the
mean values of the first formant against the mean values of the second formant for all the
vowels. This provides a characteristic pattern or ‘vowel space’ for the speaker, as shown in
Figure 5, which is based on data measured from the voice of a murder suspect during interview.
In this figure the first formant frequency is shown on the vertical axis and the second formant
frequency on the horizontal axis. The origins of the axes are placed in the top right hand corner,
so that the positions of the points on the chart relate approximately to the position of the tongue
and jaw: vowels pronounced with a forward position of the tongue and spread lips appear on the
left and those with a retracted tongue and rounded lips appear on the right. Vowels with a raised
tongue and closed jaw are at the top and vowels with a lower tongue and open jaw at the bottom.
The individual letters represent a point positioned at the intersection of the means of the first and
second formant frequencies of the vowel in question. The ellipses represent a distance of two
standard deviations around the mean for that vowel, i.e the area which would include 95% of the
speaker’s vowels of that type.
9
← second formant frequency (Hz)
first formant frequency (Hz) →

Figure 5: Formant plot of short vowels of a suspect in a police interview recording. The phonetic symbols
represent a point positioned at the intersection of the mean first and second formant frequencies of the
vowel in question and the ellipses represent two standard deviations around the means. From left to
right, the symbols represent ‘i’ as in ‘ring’, ‘e’ as in ‘left’, ‘a’ as in ‘that’, ‘u’ as in ‘up’, ‘o’ as in ‘got’,
and ‘oo’ as in ‘good’.
Figure 6: Comparison of short vowels from a suspect in a police interview recording with corresponding vowels of
a speaker recorded via a listening device. The ellipses are the same as in Figure 5 – i.e. they represent
two standard deviations around the means for the suspect’s voice. The phonetic symbols represent
individual short vowels from the unknown speaker.
10
In Figure 6 the same ellipses are superimposed on a set of data points representing the formant
frequencies of vowels from an unknown speaker recorded via a listening device. The degree of
overlap between the two speakers can be roughly quantified by calculating the proportion of
vowel points from the unknown speaker which fall within the appropriate ellipse of the suspect
speaker. In this particular diagram, only 50% of the unknown speaker’s vowels fall within the
corresponding ellipse of the suspect speaker. Based on this data alone, there would have to be
considerable doubt that the speakers are the same.
Data from a different case are shown in Figures 7 and 8. In these plots the mean frequencies of
the vowel sets are compared. In Figure 7 the combined mean values from a perpetrator’s vowels
in a number of phone calls are compared with the values for the equivalent vowels spoken by 20
adult male speakers of General Australian English from the Australian National Database of
Spoken Language (Millar, Vonwiller, Harrington & Dermody 1994). The two patterns look quite
different, and in the overall mean difference between the values of the perpetrator and those of
this sample of the general population is 12.2%. Figure 8 shows the same set of perpetrator
vowels compared with those of a suspect. The degree of similarity between the two patterns
appears much greater, and indeed the mean difference between the values for the perpetrator and
those for the suspect is 3.3%. Thus the formant frequencies of the perpetrator are considerably
closer to those of the suspect than they are to those of the general population. Experience
suggests that a variation of 5% or less is of the order expected within a single speaker.
These, then are the major parameters that may be used to build up a profile of a two or more
voices for the purposes of forming an opinion as to their overall similarity.
11
firstfirst
formant
formant
frequency
frequency
(Hz)
(Hz)
→→
Figure 7: Comparison of vowels from a perpetrator with vowels from the Australian National Database of Spoken
Language. Each phonetic symbol represents the mean for that vowel in one of the two sets of data. All
means for a given data set are connected by a line:
= perpetrator = ANDOSL data

Figure 8: Comparison of vowels from a perpetrator with vowels from a suspect. Each phonetic symbol represents
the mean for that vowel in one of the two sets of data. All means for a given data set are connected by a
line: = perpetrator = suspect
12
3. Presenting the evidence

3.1 Problems with ‘Probability’
Having carried out the analyses and formed an opinion, the phonetician must now present the
evidence to the court and express his opinions based upon it. The usual expectation of lawyers
appears to be that the expert give his opinion in the form of an answer – preferably in numerical
terms – to the question “Given the degree of similarity between the speech samples, what is the
probability of the two voices belonging to the same speaker?” And the answer that is required is
something along the lines of: “Given the high degree of similarity between the two speech
samples, there is a very high (90%) probability that these two samples are from the same
speaker”. Some expert (but non-phonetician) witnesses in the field appear to be prepared to
make statements of this kind. This is, however, highly inappropriate and has no statistical basis.
The witness is in fact expressing the probability of a hypothesis, given the evidence. This is
not only logically incorrect, but, according to my understanding, also legally incorrect, as this is
ultimately the job of the court and not of the expert witness. Essentially what forensic
phoneticians have traditionally been asked to do is akin to answering the question “Given that
this creature has wings, what are the chances of it being a bird?” The question the expert witness
should be answering, however, is the equivalent of “Given that this creature is a bird, what are
the chances of it having wings?” Translated back into the real world, this means “If we assume
that the two speech samples are from the same speaker, what is the likelihood of them displaying
this degree of similarity?” In other words s/he should be expressing the probability of the
evidence, given the hypothesis.
3.2 The Likelihood Ratio

Ideally the evidence of the expert witness should be expressed within the framework of Bayesian
statistics (Robertson & Vignaux 1995). This means answering a question of the type “How much
more likely is this creature to have wings if it were a bird than if it were not a bird?” or in reality
“How much more likely is the given degree of similarity between samples if they were by the
same speaker than if they were by different speakers?”. This involves the use of a likelihood
ratio, which is arrived at in the following way (Rose 2002). The phonetician observes and
quantifies a certain degree of similarity (X) between the perpetrator and suspect speech samples.
Let’s assume, for the sake of argument, that published research has shown that, with paired
13
samples of X degree of similarity, 85% are from the same speaker. This means that the
probability of observing X degree of similarity between samples from the same speaker would
be 85% and the probability of finding X degree of similarity between different speakers would
be 15%. The likelihood ratio is then 85 divided by 15 or 5.67.
A likelihood ratio greater than 1.0 supports the prosecution hypothesis – i.e shows that the
degree of similarity found between the speech samples is more likely if they were by the same
speaker than if they were by different speakers. A likelihood ratio less than 1.0 supports the
defence hypothesis – i.e shows that the degree of similarity found between the speech samples is
more likely if they were by different speakers than if they were by the same speaker. The value
of the likelihood ratio thus quantifies the strength of the evidence, and likelihood ratios from
different areas of forensic evidence can be combined. Each successive likelihood ratio should be
evaluated in terms of the degree of confidence in the assertion of guilt before consideration of
the evidence in question (the so-called ‘prior odds’) (Robertson & Vignaux 1995).
4. Speaker Identification: where we are now

At the beginning of the previous subsection I used the word “ideally” and indeed the subsequent
paragraphs describe the ideal situation. The key sentence is the one beginning “Let’s assume, for
the sake of argument, that published research has shown …”. Unfortunately, however, we cannot
assume any such thing at this time. Our knowledge of what is ‘normal’ or ‘average’ for the
population is severely lacking in most areas and the data that we do have is inevitably limited to
the majority population groups – i.e. in the case of Australia to the Anglo-Celtic community –
and to somewhat artificial ‘laboratory’ conditions. Furthermore, the statistical modelling of the
highly complex variation that occurs in speech is still in its infancy, and is still a long way from
being able to cope with the distinction between variation due to speaker differences and variation
due to differences in recording conditions – as we have seen, a crucial requirement in the
forensic context. Thus statements as to probability made by forensic phoneticians are at this
stage limited by these two significant constraints. Whilst every scrap of available quantitative
data on the general population will be taken into account, such statements will inevitably rely
heavily on the extensive experience and accumulated knowledge of the individual expert.
14
References
BALDWIN J & FRENCH P (1991) Forensic Phonetics. London & New York: Pinter.
BRAUN A (1995) Fundamental frequency – how speaker-specific is it? In: A BRAUN & J-P
KÖSTER (eds) Studies in Forensic Phonetics. Trier: Wissenschaftlicher Verlag Trier, 9-23.
BROEDERS APA & RIETVELD ACM (1995) Speaker identification by earwitness. In A Braun and
J-P Köster (eds), Studies in Forensic Phonetics. Trier: Wissenschaftlicher Verlag.
BUTCHER AR & MOODY MP (1999) The case of the ‘third voice’: a rare opportunity for closed
set comparison in the forensic context. Paper presented at the Annual Conference of the
International Association for Forensic Phonetics, York, England.
ESLING JH (1994) Voice quality. In R.E. Asher & J.M.Y. Simpson (eds) The Encyclopedia of
Language and Linguistics. Oxford: Pergamon Press, 4950-4953.
GRUBA JS & POZA FT (1995) Voicegram identification evidence. 54 American Jurisprudence

Trials 1.
HOLLIEN H (1990) The Acoustics of Crime. New York & London: Plenum.
HOLLIEN H (2002) Forensic Voice Identification. San Diego: Academic Press.
HOLLIEN H, HUNTLEY RA, KÜNZEL HJ & HOLLIEN PA (1995) Criteria for earwitness lineups.
Forensic Linguistics 2, 143-153.
ISSHIKI N & TAKEUCHI Y (1970) Factor analysis of hoarseness. Studia Phonologica 5, 37-44.
KÜNZEL HJ (1989) How well does average fundamental frequency correlate with speaker height
and weight? Phonetica 46, 117-125.
LAVER J (1980) The Phonetic Description of Voice Quality. Cambridge: Cambridge University
Press.
OATES JM & RUSSELL A (1998) Learning voice analysis using an interactive multi-media
package: Development and preliminary evaluation. Journal of Voice 12, 500-512.
MILLAR JB, VONWILLER J, HARRINGTON JM & DERMODY P (1994). The Australian National
Database of Spoken Language. Proceedings of the International Conference on Acoustics,
Speech and Signal Processing, Adelaide, 67-100.
MITCHELL AG & DELBRIDGE A (1965) The pronunciation of English in Australia (revised

edition). Sydney: Angus and Robertson.
15
ROBERTSON B & VIGNAUX GA (1995) Interpreting Evidence: Evaluating Forensic Science in

the Courtroom. New York: John Wiley & Sons.
ROSE P (2002) Forensic Speaker Identification. London: Taylor & Francis.
WARREN J (1999) ‘Wogspeak’: transformations of Australian English. Journal of Australian

Studies 62, 86-94.
WENDLER J, RAUHUT A & KRÜGER H (1986) Classification of voice qualities. Journal of

Phonetics 14, 483-488.
16

Forensic Phonetics

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Forensic Phonetics

Hochgeladen von

Copyright:

Verfügbare Formate

Forensic Phonetics:

Issues in speaker identification evidence

Centre for Human Communication Research

1. The field of forensic phonetics

2. Speaker Identification: analysis and measurement

2.1. Auditory analysis

2.2 Acoustic analysis

2.2.1 Fundamental frequency

240 perpetrator suspect 1 suspect 2

2.2.2. Long term average spectrum

(a) (b) frequency (Hz) →

2.2.3 Vowel formant frequencies

head had hard

← second formant frequency (Hz)

first formant frequency (Hz) →

← second formant frequency (Hz)

first formant frequency (Hz) →

← second formant frequency (Hz)

← second formant frequency (Hz)

3. Presenting the evidence

3.2 The Likelihood Ratio

4. Speaker Identification: where we are now

GRUBA JS & POZA FT (1995) Voicegram identification evidence. 54 American Jurisprudence

HOLLIEN H (2002) Forensic Voice Identification. San Diego: Academic Press.

MITCHELL AG & DELBRIDGE A (1965) The pronunciation of English in Australia (revised

ROBERTSON B & VIGNAUX GA (1995) Interpreting Evidence: Evaluating Forensic Science in

ROSE P (2002) Forensic Speaker Identification. London: Taylor & Francis.

WARREN J (1999) ‘Wogspeak’: transformations of Australian English. Journal of Australian

WENDLER J, RAUHUT A & KRÜGER H (1986) Classification of voice qualities. Journal of

Das könnte Ihnen auch gefallen