Beruflich Dokumente
Kultur Dokumente
Introduction
The question posed in the title of this paper is one that was recently asked my one of
my (obviously) male friends. The purpose of this study will investigate how this
comedic ambiguity arose, discussing how the place of articulation of the word-final
alveolar segment is affected by the place of articulation of the following word-initial
segment. I wish to investigate to what degree this ‘assimilation’ occurs in my own
speech, and whether fast or slow speech rate affects this.
To avoid any ambiguity, I will adopt the term ‘assimilation’ to refer to any instances
of one segment becoming more like another, encompassing the range of potential
factors at play from deletion to reduction. I suggest this at the beginning to avoid any
of the uncertainty correlated with the differing usages within the literature.
Within the investigation, I also wish to examine whether any evidence can be seen to
support varying degrees of assimilation, or to suggest that it is a process of gestural
overlap that is affecting the segments under analysis. I will also discuss the notion of
phonological representation and, contrastively, phonetic implementation.
While the notion of assimilation is widely discussed within the field, ‘articulatory
phonology’ presents an opposing analysis of what occurs in the processes of
connected speech.
Traditional assimilation theory argues for cases where “two distinct underlying
segments abut, and one “adopts” characteristics of the other to become more similar,
or even identical, to it.” (Nolan p. 262).
Conversely, Browman & Goldstein claim that cases of apparent assimilation are really
due to gestural overlap; where the “basic units” of phonological contrast are known as
“gestures” which are abstract characterizations of natural classes of sounds,
encompassing both duration and time. (Browman and Goldstein, 1992. p. 160)
The conflicting theories that are predominant within phonology affect the predictions
made about the following experiment. If the results suggest complete assimilation of
the segments, this may provide more evidence for the former theory of assimilation.
However, if the majority of the evidence suggests segments becoming more like each
other, rather than actually ‘becoming’ each other, it may enable us to argue more for
the theory of gestural overlap.
They also argue for different possibilities of what could be going on in the mental and
physical domains of speech. While the traditional view (supported by SPE) posits a
great deal of influence coming from the abstract mental phonological representations
and consequent processes, the gestural approach puts less emphasis on the
abstractness of these representations. It does, however, suggest a mental
representation that contains a temporal element.
By analysing the effects on the word-final alveolar, I will attempt to relate any
findings to the theories presented.
To carry out the study, a recording was made of my own voice. I recorded 4
repetitions of 6 test sequences at a careful speech rate and then 4 repetitions of the
same sequences at a fast speech rate. The recordings were saved as sound files, and by
using acoustic analysis software called Praat (Boersma & Weenink, 2009), the speech
was analysed to uncover what is happening to the segments under discussion.
Barry (1985) and Kerswill (1985) were also both interested in the effect of such
connected speech processes. While they used EPG recordings for their analysis, the
findings provide some input on what could be expected from the present study.
They found that in general, there was a tendency to “make less alveolar contact in the
faster tokens”. While this means that the alveolar was not fully realised in these cases,
it does not mean that there was complete assimilation to the following velar. The
prediction, however, that can be made from this is that cases of assimilation should be
more evident in fast speech than slow speech. (Nolan p. 264)
Another prediction is that the results may vary among the different lexical sets as
different vowels are used. “…differences in phonological form will always result in
distinct articulatory gestures.” At this stage I wish to highlight that the study makes
within-speaker comparisons and that the regional dialect may also affect certain
factors. (Nolan, p. 272)
I predict that cases of assimilation will be evident as a result of previous research, but
also because of the question posed in the title. It is a phenomenon that we seem to be
aware of. The title of the paper, however, raises some issues for how this incident
occurs. Nolan questions whether residual alveolars are sufficient to cue the perception
of a lexical alveolar. In the title, this does not appear to have happened. This addresses
the notion of a gradual articulation process as in this case, an articulatory continuum
of forms has not been productive for conveying the meaning to the listener.
Method
The study was carried out on my own voice. I am a 20 year old female, speaking with
Belfast Vernacular English. A common characteristic of Northern Irish (female)
speech is that it tends to be quite fast, and I am often told this about my own speech.
Therefore, it will be interesting to see the results given by the different speech rates.
The experimental materials consisted of 3 pre-designed question-answer sequences;
The idea was for the test sentence to be in as comparable a context as possible. There
needed to be the same number of syllables in the test sequences, with phrasal stress
falling on the default position because of the question format of the background
sentence.
After ensuring the above criteria were met, the following materials were decided
upon;
When the experimental materials were ready, it was time to make the recordings. In
the studio we followed three steps that are used by all sound capturing equipment to
make high quality audio recordings of the human voice for analysis.
Capturing: Within the studio, there were two rooms. An isolation booth where an
AKG CK 98 hypercardoid microphone was used to record the test sentences. This
kind of microphone is appropriate for capturing human voice as it is highly directional
and rejects sound from everywhere, except from directly in front of it. There was also
control room were the technical elements of the recording are controlled.
Encoding: At this stage, a recorder inscribes the electrical signal into a device called
MOTU 828. This device is used to control both the recording level and the volume of
the recordings. It is also an analogue to digital converter (ADC) that encodes the
electrical signal as a binary code which is stored on the computer as an audio (wav)
file. A piece of software called SONAR makes sense of the wav files and allows you
record, playback and edit them.
Playback: Where the digital information is converted back into electrical signal. The
sound is then played back through the speakers.
The sampling frequency that we used was 48 kHz, and the bit depth was of 16 bit.
This ensures a high quality recording as a higher sample rate allows a more accurate
representation of the original sound.
Once the sound files had been saved onto a computer system, they were analysed
using acoustic analysis software called Praat. (Boersma & Weenink ,2009). Three F2
measurements were taken from the final three pitch peaks of the vowel preceding the
assimilation site. The test sequences were repeated four times at each speech rate to
allow generalisations to be made as both F1 and F2 values can differ for the same
vowel.
When the F2 readings had been recorded, the means and standard deviations were
calculated and these will now be presented within the results.
Results
The results can be observed in the following graphs.
2500
2000
Frequency (Hz)
1500
Fag Gadget
Fad Gadget
1000
500
0
Slow Fast
Speech rate
Graph 1 shows that the /d/ and /g/ remain somewhat distinct in the slow speech rate,
suggesting that in this condition they do not assimilate. The frequency of the alveolar
in this condition is similar the frequency readings of the alveolars in the control
condition (see Graph 3) for slow speech.
In the fast speech condition, the alveolar gets higher in frequency and does not match
the control conditions. This graph shows how the frequency reading for the fast
speech of the alveolar is almost exactly the same as the fast speech frequency of the
target velar.
This suggests that complete assimilation has occurred in fast speech. To uncover
whether any assimilation can be observed in slow speech, we need to look more
closely at the results. The following graph will show the F2 readings for the final
three peaks of the preceding vowel for each speech rate.
F2 values for mean slow speech
2500
2000
Frequency (Hz)
500
0
1 2 3
F2 Values
Graph 1a: Mean F2 readings for slow speech rate for ‘fag gadget’, ‘fad gadget’ and
control setting ‘fad tablet’.
The graph above suggests that in the slow speech rate, the alveolar is partially
assimilating to the velar. The F2 readings for the alveolar are getting higher in
frequency towards the end, thus becoming more like the velar. This provides some
interesting insight to the assimilation/gestural overlap debate and I will talk about this
in greater detail within the discussion.
The following graph shows the average F2 readings in Dug Cable and Dud Gable.
1950
1900
Frequency (Hz)
1850
1800 Dug Cable
1750 Dud Gable
1700
1650
1600
Slow Fast
Speech rate
In Graph 2, we do not see any assimilation patterns. The data used to plot this graph
encompasses the averages of all three vowel-final F2 readings and so it may be more
appropriate to look more closely at this data.
By looking more closely at the F2 readings of the preceding vowels, we may be able
to understand better what is going on in this case. The following graphs show the F2
readings for both slow and fast speech for each test sequence.
Dug Cable
1900
1850
Frequency (Hz)
1800
Mean slow speech
Mean fast speech
1750
1700
1650
1 2 3
F2 Readings
Graph 2a: Shows the final three F2 readings for the preceding vowel in the test
sequence ‘Dug Cable’.
For ‘Dug Cable’, we can see how in both fast and slow speech, the frequency gets
lower as the vowel approaches the velar. However, the fast speech condition drops
more substantially in frequency than that of the slow speech.
Dud Gable
2000
1950
1900
Frequency (Hz)
1850
1800 Mean slow speech
1750 Mean fast speech
1700
1650
1600
1550
1 2 3
F2 Readings
Graph 2b: Shows the final three F2 readings for the preceding vowel in the test
sequence ‘Dud Gable’.
This graph shows the sequence in which we would have predicted to see assimilation.
In the slow speech condition, the F2 stays at a reasonably constant frequency. In the
fast speech condition, the F2 starts to go up, but then starts to drop as though it is
following a similar pattern to those shown in Graph 2a. This may suggest a case of
partial assimilation and shows the importance of closely examining the data.
Table 1: Standard Deviations of final three F2 measurements in fast and slow speech
rates for ‘Dud Cable’ and Dug Gable’.
In the table above, we can see how the standard deviations vary among the speech
rates. In ‘Dud Gable’, the standard deviation is relatively small; suggesting that each
result is similar to the mean and therefore this is a reliable result. For the faster
speech, however, the standard deviations are much larger, demonstrating that the
results are much more sporadic and so these results may not be reliable. This would
account for what is demonstrated on Graph 2.
2500
2000
Frequency (Hz)
1500
Dud Table
Fad Tablet
1000
500
0
Slow Fast
Speech Rate
Graph 3 is a representation of the controls that were used. They show the averages of
the alveolars in fast and show speech in two different settings. The differences that
can be observed among the two sequences can be attributed to the fact that the vowels
are different in terms of backness. The vowel in ‘fad tablet’ (and the other sequences
containing this vowel) are realised with the front, low vowel /a/, whereas in ‘dud
table’ (and its relative sequences), the vowel is the open-mid, back vowel /ʌ/.
The front vowel /a/ gives a lower set of F2 values in the control setting because front
vowels are lower with following alveolar rather than velar segments. Back vowels,
however, have higher F2 readings preceding an alveolar because of the more drastic
movement of the tongue from the back of the mouth to the alveolar ridge.
The use of different vowels may have been a factor in yielding different results. In
some of the cases, it was difficult to determine the exact location of the final three
pitch peaks. This may have resulted in some anomalies within results. To rectify this
for future studies, it may be better to either keep similar vowels in each test sequence,
or to take a larger sample size in order to provide a more accurate generalisation.
Discussion
Overall, the results provided some enlightening data. In the first test-sequence, we can
see how complete assimilation has occurred within fast speech, and partially in the
slow speech condition.
In the second test-sequence, while at first there did not appear to be any occurrences
of assimilation, on a closer analysis, we saw a hint of partial assimilation of the
alveolar in the fast speech condition. This, however, is proposed hesitantly as the
standard deviations suggested that the results may not be within reliable confidence
limits.
Using fast and slow speech rates gave a comparable setting to uncover the effect of
potential connected speech processes. It also enabled a way to relate the findings to
well established literature in comparing the settings.
The results demonstrate a number of aspects that can be discussed with regards to the
hypotheses proposed in the introduction.
In the first set of results (‘fad gadget’ vs. ‘fag gadget’) we can observe a case of
apparent complete assimilation in the fast speech setting. Using alternative techniques
would enable more clarification on the matter but using this particular method
suggests the alveolar has been deleted before the velar. Employing the use of
Electromagnetic Articulatography (EMA) would enable future researchers to uncover
whether or not the tongue-tip is moving at all, therefore proving complete assimilation
if it does not. For now, I will argue for complete assimilation based on my own
results.
This, therefore, provides evidence for the theory of phonological implementation as
one segment becomes another in a specific environment. Consequently it would
appear that these phonological processes have derived specific surface representations
that have served as an input for the motor commands controlling what my articulators
produced. This evidence is favoured by Ladd and Scobbie whose results suggest that
gestural overlap is not, on the whole, a suitable model for assimilation. (Ladd and
Scobbie)
The second sets of results (‘dud gable’ vs. ‘dug cable’) seem somewhat more
unreliable. From what can be interpreted, it appears as though some partial
assimilation can be observed. This could also be interpreted as a ‘reduced alveolar’ in
that the tongue tip may still be raising, but not enough to make a full alveolar. This
finding is supported in Ellis and Hardcastle (2002) where the notion of reduced
alveolars is proposed to account for what is being shown in their EPG data.
Reduction can also be thought of as reducing the magnitude of the gesture. In this
sense, it may be that the magnitude of the gesture is decreased as it overlaps with
another gesture. A theory such as this would be popular for researchers such as Nolan,
who suggest this to be a much more intuitive way of organising the articulators, and
Browman and Goldstien who have also provided evidence for gestural overlap.
In the second set of results (Graph 2), I was much more aware of pressures to be clear
in pronouncing the separate segments because of the word-initial voiceless velar. As a
result of this, the motor commands where to articulate each process more distinctly. In
the spectrogram this is very obvious in certain cases where it showed a lot of noise in
the voiceless velar, much more than the voiced. This suggests that the articulators
where working to intensify this segment. It is also prudent to note that in the second
set, the voiced alveolar precedes a voiceless alveolar rather than a voiced one as in
dataset one. This extra feature may have skewed the results, especially in the situation
in which the recording took place which I will now discuss.
I found recording my own voice quite a daunting experience! While there were only 3
fellow students and a technician in the control room, the setting was quite intimidating
and hearing my own voice was quite strange! I found it quite difficult to speak in a
natural way and found myself tripping over words and my mouth getting very dry. I
feel this may have affected the results as it may have changed the true frequencies at
which I speak. While trying to concentrate on reading out the sequences, I did not
intonate my question as a question should be, and so the sentences sound slightly
unnatural.
While the purpose of making the test sequences was not revealed until after the
recording was made, as a linguistics student I found myself speculating over what the
motivation could be. While I attempted to not let this influence my recording, I
believe these speculations still affected the way in which I said the sequences within
the recording.
Boersma, P., & Weenink, D. (2009). Praat: doing phonetics by computer (Version
5.1.04) [Computer Program].