Sie sind auf Seite 1von 11

Instrumental Phonetics: Lab Project

“Why is it called a ‘ham-bag’? It doesn’t carry ham!” Cases of assimilation in


Northern Irish speech.

Introduction
The question posed in the title of this paper is one that was recently asked my one of
my (obviously) male friends. The purpose of this study will investigate how this
comedic ambiguity arose, discussing how the place of articulation of the word-final
alveolar segment is affected by the place of articulation of the following word-initial
segment. I wish to investigate to what degree this ‘assimilation’ occurs in my own
speech, and whether fast or slow speech rate affects this.

To avoid any ambiguity, I will adopt the term ‘assimilation’ to refer to any instances
of one segment becoming more like another, encompassing the range of potential
factors at play from deletion to reduction. I suggest this at the beginning to avoid any
of the uncertainty correlated with the differing usages within the literature.

Within the investigation, I also wish to examine whether any evidence can be seen to
support varying degrees of assimilation, or to suggest that it is a process of gestural
overlap that is affecting the segments under analysis. I will also discuss the notion of
phonological representation and, contrastively, phonetic implementation.

While the notion of assimilation is widely discussed within the field, ‘articulatory
phonology’ presents an opposing analysis of what occurs in the processes of
connected speech.

Traditional assimilation theory argues for cases where “two distinct underlying
segments abut, and one “adopts” characteristics of the other to become more similar,
or even identical, to it.” (Nolan p. 262).

Conversely, Browman & Goldstein claim that cases of apparent assimilation are really
due to gestural overlap; where the “basic units” of phonological contrast are known as
“gestures” which are abstract characterizations of natural classes of sounds,
encompassing both duration and time. (Browman and Goldstein, 1992. p. 160)

The conflicting theories that are predominant within phonology affect the predictions
made about the following experiment. If the results suggest complete assimilation of
the segments, this may provide more evidence for the former theory of assimilation.
However, if the majority of the evidence suggests segments becoming more like each
other, rather than actually ‘becoming’ each other, it may enable us to argue more for
the theory of gestural overlap.

They also argue for different possibilities of what could be going on in the mental and
physical domains of speech. While the traditional view (supported by SPE) posits a
great deal of influence coming from the abstract mental phonological representations
and consequent processes, the gestural approach puts less emphasis on the
abstractness of these representations. It does, however, suggest a mental
representation that contains a temporal element.

By analysing the effects on the word-final alveolar, I will attempt to relate any
findings to the theories presented.

To carry out the study, a recording was made of my own voice. I recorded 4
repetitions of 6 test sequences at a careful speech rate and then 4 repetitions of the
same sequences at a fast speech rate. The recordings were saved as sound files, and by
using acoustic analysis software called Praat (Boersma & Weenink, 2009), the speech
was analysed to uncover what is happening to the segments under discussion.

Barry (1985) and Kerswill (1985) were also both interested in the effect of such
connected speech processes. While they used EPG recordings for their analysis, the
findings provide some input on what could be expected from the present study.

They found that in general, there was a tendency to “make less alveolar contact in the
faster tokens”. While this means that the alveolar was not fully realised in these cases,
it does not mean that there was complete assimilation to the following velar. The
prediction, however, that can be made from this is that cases of assimilation should be
more evident in fast speech than slow speech. (Nolan p. 264)

Another prediction is that the results may vary among the different lexical sets as
different vowels are used. “…differences in phonological form will always result in
distinct articulatory gestures.” At this stage I wish to highlight that the study makes
within-speaker comparisons and that the regional dialect may also affect certain
factors. (Nolan, p. 272)

I predict that cases of assimilation will be evident as a result of previous research, but
also because of the question posed in the title. It is a phenomenon that we seem to be
aware of. The title of the paper, however, raises some issues for how this incident
occurs. Nolan questions whether residual alveolars are sufficient to cue the perception
of a lexical alveolar. In the title, this does not appear to have happened. This addresses
the notion of a gradual articulation process as in this case, an articulatory continuum
of forms has not been productive for conveying the meaning to the listener.

Method
The study was carried out on my own voice. I am a 20 year old female, speaking with
Belfast Vernacular English. A common characteristic of Northern Irish (female)
speech is that it tends to be quite fast, and I am often told this about my own speech.
Therefore, it will be interesting to see the results given by the different speech rates.
The experimental materials consisted of 3 pre-designed question-answer sequences;

1. What kind of gadget was it?


It was a fad gadget.
2. What kind of gadget was it?
It was a fag gadget.
3. What kind of tablet was it?
It was a fad tablet.
I then designed my own test materials that had to fit the following criteria;

• A word-final /d/ following a vowel


• Preceding word-initial alveolar and velar stops
• A word-final /g/ following the same vowel preceding a word-initial velar stop
• There must be a back vowel at the beginning of the test sequence

The idea was for the test sentence to be in as comparable a context as possible. There
needed to be the same number of syllables in the test sequences, with phrasal stress
falling on the default position because of the question format of the background
sentence.

After ensuring the above criteria were met, the following materials were decided
upon;

1. What kind of table was it?


It was the dud table.
2. What kind of cable was it?
It was the dug cable.
3. What kind of gable was it?
It was the dug gable.

When the experimental materials were ready, it was time to make the recordings. In
the studio we followed three steps that are used by all sound capturing equipment to
make high quality audio recordings of the human voice for analysis.

Capturing: Within the studio, there were two rooms. An isolation booth where an
AKG CK 98 hypercardoid microphone was used to record the test sentences. This
kind of microphone is appropriate for capturing human voice as it is highly directional
and rejects sound from everywhere, except from directly in front of it. There was also
control room were the technical elements of the recording are controlled.
Encoding: At this stage, a recorder inscribes the electrical signal into a device called
MOTU 828. This device is used to control both the recording level and the volume of
the recordings. It is also an analogue to digital converter (ADC) that encodes the
electrical signal as a binary code which is stored on the computer as an audio (wav)
file. A piece of software called SONAR makes sense of the wav files and allows you
record, playback and edit them.
Playback: Where the digital information is converted back into electrical signal. The
sound is then played back through the speakers.

The sampling frequency that we used was 48 kHz, and the bit depth was of 16 bit.
This ensures a high quality recording as a higher sample rate allows a more accurate
representation of the original sound.

Once the sound files had been saved onto a computer system, they were analysed
using acoustic analysis software called Praat. (Boersma & Weenink ,2009). Three F2
measurements were taken from the final three pitch peaks of the vowel preceding the
assimilation site. The test sequences were repeated four times at each speech rate to
allow generalisations to be made as both F1 and F2 values can differ for the same
vowel.
When the F2 readings had been recorded, the means and standard deviations were
calculated and these will now be presented within the results.

Results
The results can be observed in the following graphs.

Fad Gadget vs. Fag Gadget

2500

2000
Frequency (Hz)

1500
Fag Gadget
Fad Gadget
1000

500

0
Slow Fast
Speech rate

Graph 1: Assimilation of post-lexical alveolars and velars ‘fad/gadget’ and


‘fag/gadget’ respectively.

Graph 1 shows that the /d/ and /g/ remain somewhat distinct in the slow speech rate,
suggesting that in this condition they do not assimilate. The frequency of the alveolar
in this condition is similar the frequency readings of the alveolars in the control
condition (see Graph 3) for slow speech.

In the fast speech condition, the alveolar gets higher in frequency and does not match
the control conditions. This graph shows how the frequency reading for the fast
speech of the alveolar is almost exactly the same as the fast speech frequency of the
target velar.

This suggests that complete assimilation has occurred in fast speech. To uncover
whether any assimilation can be observed in slow speech, we need to look more
closely at the results. The following graph will show the F2 readings for the final
three peaks of the preceding vowel for each speech rate.
F2 values for mean slow speech

2500

2000
Frequency (Hz)

1500 fag gadget


fad gadget
1000 fad tablet

500

0
1 2 3
F2 Values

Graph 1a: Mean F2 readings for slow speech rate for ‘fag gadget’, ‘fad gadget’ and
control setting ‘fad tablet’.

The graph above suggests that in the slow speech rate, the alveolar is partially
assimilating to the velar. The F2 readings for the alveolar are getting higher in
frequency towards the end, thus becoming more like the velar. This provides some
interesting insight to the assimilation/gestural overlap debate and I will talk about this
in greater detail within the discussion.

The following graph shows the average F2 readings in Dug Cable and Dud Gable.

Dug Cable vs. Dud Gable

1950
1900
Frequency (Hz)

1850
1800 Dug Cable
1750 Dud Gable

1700

1650
1600
Slow Fast
Speech rate

Graph 2: Assimilation of post lexical alveolars and velars in ‘dud/gable’ and


‘dug/cable’ respectively.

In Graph 2, we do not see any assimilation patterns. The data used to plot this graph
encompasses the averages of all three vowel-final F2 readings and so it may be more
appropriate to look more closely at this data.
By looking more closely at the F2 readings of the preceding vowels, we may be able
to understand better what is going on in this case. The following graphs show the F2
readings for both slow and fast speech for each test sequence.

Dug Cable

1900

1850
Frequency (Hz)

1800
Mean slow speech
Mean fast speech
1750

1700

1650
1 2 3
F2 Readings

Graph 2a: Shows the final three F2 readings for the preceding vowel in the test
sequence ‘Dug Cable’.

For ‘Dug Cable’, we can see how in both fast and slow speech, the frequency gets
lower as the vowel approaches the velar. However, the fast speech condition drops
more substantially in frequency than that of the slow speech.

Dud Gable

2000
1950
1900
Frequency (Hz)

1850
1800 Mean slow speech
1750 Mean fast speech
1700
1650
1600
1550
1 2 3
F2 Readings

Graph 2b: Shows the final three F2 readings for the preceding vowel in the test
sequence ‘Dud Gable’.

This graph shows the sequence in which we would have predicted to see assimilation.
In the slow speech condition, the F2 stays at a reasonably constant frequency. In the
fast speech condition, the F2 starts to go up, but then starts to drop as though it is
following a similar pattern to those shown in Graph 2a. This may suggest a case of
partial assimilation and shows the importance of closely examining the data.

In order to validate the results, standard deviations were calculated.

Table 1: Standard Deviations of final three F2 measurements in fast and slow speech
rates for ‘Dud Cable’ and Dug Gable’.

Dud Gable Slow


78.9 48.2 15.1
Fast
295.7 275.1 221.3
Dug Cable Slow
85.5 151.9 134.5
Fast
224.3 173.1 153.4

In the table above, we can see how the standard deviations vary among the speech
rates. In ‘Dud Gable’, the standard deviation is relatively small; suggesting that each
result is similar to the mean and therefore this is a reliable result. For the faster
speech, however, the standard deviations are much larger, demonstrating that the
results are much more sporadic and so these results may not be reliable. This would
account for what is demonstrated on Graph 2.

Dud Table vs. Fad Tablet

2500

2000
Frequency (Hz)

1500
Dud Table
Fad Tablet
1000

500

0
Slow Fast
Speech Rate

Graph 3: Graph to show the control setting of post-lexical assimilation of alveolars.

Graph 3 is a representation of the controls that were used. They show the averages of
the alveolars in fast and show speech in two different settings. The differences that
can be observed among the two sequences can be attributed to the fact that the vowels
are different in terms of backness. The vowel in ‘fad tablet’ (and the other sequences
containing this vowel) are realised with the front, low vowel /a/, whereas in ‘dud
table’ (and its relative sequences), the vowel is the open-mid, back vowel /ʌ/.

The front vowel /a/ gives a lower set of F2 values in the control setting because front
vowels are lower with following alveolar rather than velar segments. Back vowels,
however, have higher F2 readings preceding an alveolar because of the more drastic
movement of the tongue from the back of the mouth to the alveolar ridge.

The use of different vowels may have been a factor in yielding different results. In
some of the cases, it was difficult to determine the exact location of the final three
pitch peaks. This may have resulted in some anomalies within results. To rectify this
for future studies, it may be better to either keep similar vowels in each test sequence,
or to take a larger sample size in order to provide a more accurate generalisation.

Discussion
Overall, the results provided some enlightening data. In the first test-sequence, we can
see how complete assimilation has occurred within fast speech, and partially in the
slow speech condition.

In the second test-sequence, while at first there did not appear to be any occurrences
of assimilation, on a closer analysis, we saw a hint of partial assimilation of the
alveolar in the fast speech condition. This, however, is proposed hesitantly as the
standard deviations suggested that the results may not be within reliable confidence
limits.

In the introduction, I highlighted the purpose; to uncover to what degree word-final


alveolars assimilate to following velars. Using specialised acoustic software and
spectrogram readings, I was able to take a much closer and reliable look at a real-life
speech recording and thoroughly analyse what processes are occurring underneath the
level at which we perceive the speech sound.

Using fast and slow speech rates gave a comparable setting to uncover the effect of
potential connected speech processes. It also enabled a way to relate the findings to
well established literature in comparing the settings.

The results demonstrate a number of aspects that can be discussed with regards to the
hypotheses proposed in the introduction.

In the first set of results (‘fad gadget’ vs. ‘fag gadget’) we can observe a case of
apparent complete assimilation in the fast speech setting. Using alternative techniques
would enable more clarification on the matter but using this particular method
suggests the alveolar has been deleted before the velar. Employing the use of
Electromagnetic Articulatography (EMA) would enable future researchers to uncover
whether or not the tongue-tip is moving at all, therefore proving complete assimilation
if it does not. For now, I will argue for complete assimilation based on my own
results.
This, therefore, provides evidence for the theory of phonological implementation as
one segment becomes another in a specific environment. Consequently it would
appear that these phonological processes have derived specific surface representations
that have served as an input for the motor commands controlling what my articulators
produced. This evidence is favoured by Ladd and Scobbie whose results suggest that
gestural overlap is not, on the whole, a suitable model for assimilation. (Ladd and
Scobbie)

The second sets of results (‘dud gable’ vs. ‘dug cable’) seem somewhat more
unreliable. From what can be interpreted, it appears as though some partial
assimilation can be observed. This could also be interpreted as a ‘reduced alveolar’ in
that the tongue tip may still be raising, but not enough to make a full alveolar. This
finding is supported in Ellis and Hardcastle (2002) where the notion of reduced
alveolars is proposed to account for what is being shown in their EPG data.

Reduction can also be thought of as reducing the magnitude of the gesture. In this
sense, it may be that the magnitude of the gesture is decreased as it overlaps with
another gesture. A theory such as this would be popular for researchers such as Nolan,
who suggest this to be a much more intuitive way of organising the articulators, and
Browman and Goldstien who have also provided evidence for gestural overlap.

In the second set of results (Graph 2), I was much more aware of pressures to be clear
in pronouncing the separate segments because of the word-initial voiceless velar. As a
result of this, the motor commands where to articulate each process more distinctly. In
the spectrogram this is very obvious in certain cases where it showed a lot of noise in
the voiceless velar, much more than the voiced. This suggests that the articulators
where working to intensify this segment. It is also prudent to note that in the second
set, the voiced alveolar precedes a voiceless alveolar rather than a voiced one as in
dataset one. This extra feature may have skewed the results, especially in the situation
in which the recording took place which I will now discuss.

I found recording my own voice quite a daunting experience! While there were only 3
fellow students and a technician in the control room, the setting was quite intimidating
and hearing my own voice was quite strange! I found it quite difficult to speak in a
natural way and found myself tripping over words and my mouth getting very dry. I
feel this may have affected the results as it may have changed the true frequencies at
which I speak. While trying to concentrate on reading out the sequences, I did not
intonate my question as a question should be, and so the sentences sound slightly
unnatural.

As I am often told that I speak to quickly, in formal situations, I find myself


consciously trying to be more careful with my speech rate. Other socio-linguistic
factors also tend to be at play on my style of speech. The Belfast accent can be quite
broad and so I do tend to annunciate more. This is done in one sense to be understood
better, but it is also an attempt to avoid the broader varieties of my accent. I believe
that these factors may have majorly influenced the data and it may be more natural to
try and analyse speech produced in a more naturalistic setting.

While the purpose of making the test sequences was not revealed until after the
recording was made, as a linguistics student I found myself speculating over what the
motivation could be. While I attempted to not let this influence my recording, I
believe these speculations still affected the way in which I said the sequences within
the recording.

As a result of this, it may be useful for future studies to make within-subject


comparisons over a range of recordings. I know for myself, the results may have been
more reliable did I have time to go back and do the recordings a few more times in the
studio.

In future studies, I believe it would be extremely beneficial to make use of other


methods of speech analysis alongside the one used here. While the study allowed
generalisations to be made, as I have previously mentioned, it could not be
confidently clarified whether the cases of ‘complete’ assimilation where actually
‘complete’ and using other software, a thorough analysis could be made on what the
articulators are truly doing.
References

Browman, C. P. and Goldstein, L. (1992) Articulatory Phonology, An Overview. S.


Karger AG, Basel. Pp. 155-180.

Docherty, G. and Ladd, R.D. Laboratory Phonology 2. Cambridge University Press.


pp. 261-280

Ellis, L. and Hardcastle, W. J. (2001) Categorical and gradient properties of


assimilation in alveolar to velar sequences: evidence from EPG
and EMA data. Journal of Phonetics.

Ladd, R. D. and Scobbie, J.M. External sandhi as gestural overlap? Counter-


evidence from Sardinian.

Boersma, P., & Weenink, D. (2009). Praat: doing phonetics by computer (Version
5.1.04) [Computer Program].

Das könnte Ihnen auch gefallen