Sie sind auf Seite 1von 127

Presenation : 19

Cues from vowels to identify the stop C


Manner - Temporal
Vowel duration

- Spectral Place - Spectral


F1 ( ON+SS+OFF) Rate of F1 offset slope

Vowel tilt Formants F2 & F3 Formant transition

Vowel Duration Preceding Voiced and Voiceless Stops Duration of the vowel nucleus including onset and offset formant transitions

VOWEL DURATION

Crowther & Mann (1992)


Vowel duration variations can cue the voicing of word-final alveolar stops. They studied vocalic duration and F1 offset frequency.

In productions of "pod" and "pot."

Cue: VD
Removal of the closure interval & release burst forces the listener to rely on available cues that are present in the vocalic segments.

Degraded performance will show an inability to use vowel cues.

Analogous to the case in vision, where a gray stimulus appears darker when viewed adjacent to a white stimulus than when viewed adjacent to a black one

Long closure duration

A short vocalic interval makes a long closure interval seem even longer, and thereby favors perception of a final consonant as voiceless.

Short vocalic segment

Vocalic duration cues voicing by enhancing the closure duration cue. In audition, a long closure segment seems even longer when heard in the context of a short vocalic segment. Vocalic duration thus comes to cue voicing by enhancing the saliency of the closure interval duration.

Because the contrast hypothesis is based up on general principles of perception, it predicts that the capacity for vowel duration to modify perception of closure duration (and thus that the ability to use vocalic duration as a voicing cue) should be universal.

Crowther & Mann (1992)

CROSS LINGUISTIC STUDY - ENGLISH


- JAPANESE - MANDARIN

For English, durational differences between vowels and vocalic nuclei preceding voiced vs. voiceless stops are large compared to many other languages.

Crowther and Mann (1992)


Aim: To investigate effectiveness of vocalic duration as a cue to final stop voicing. Stimuli: Synthetic CVC stimuli Language: English, Japanese, and Mandarin. Result: Native English speakers were most responsive to differences in vocalic duration compared to the Japanese and, especially, to the Mandarin speakers.

Reason

Lack of experience with final stop consonants in Japanese and Mandarin. Neither Japanese or Mandarin allows CVCs ending in stops Therefore, native speakers of this language make less extensive use of VD as a cue before the stop.

Experience with phonemeic vowel length distinction


But, Flange and Port (1981) Arabic speakers have no productive use of VD as a cue, the same is true for Japanese, but both perceive VD differences well.

It could be the long versus short vowel contrast that could have improved their performance. Mandarin No phonemically long and short vowels.

Fischer and Ohde (1990)

2ND STUDY ABOUT VOWEL DURATION

Fischer & Ohde (1990)


Vowel duration contrast can be used to identify the voicing of word-final velar stops. Vowels used were: // - High F1 steady state /I/ - Mid between // & /i/ /i/ - Low F1 steady state Effect of duration & spectral properties for a vowel with relatively short intrinsic duration compared to /i/ and //, which have long intrinsic duration.

11 CVC continua - Synthesized (3 vowel contexts)


/b/
15 ms onset transition

Steady state corresponding to vowel

50 ms transition in which F2 & F3 simulated transition to a velar consonant

Task: Listeners were asked to judge the final sound

//

/I/

The range of durations in panel (b) is shorter than those in panels (a) and (c) because of intrinsic duration differences between short and long vowels.

/i/

SYNTHETIC VERSUS NATURAL SPEECH STUDIES

VD - Sufficient cue
The results of several studies in which preceding vowel duration was varied found it to be a sufficient cue to the voicing distinction
(Krause, 1982; OKane, 1978; Raphael, 1972; Raphael, Dorman, & Liberman, 1980; Raphael et al., 1975).

Raphael (1972)
Used the Pattern Playback to synthesize syllables ending in voiced and voiceless stops

Final consonants
Voiceless Preceded by short duration vowels & Voiced Preceded by long duration vowels.

After recording the voiced series, each of the voiced sounds were converted to voiceless stimulus by eliminating the final 50-msec F1 transition. This produces the bottom stimulus

Conclusion
Preceding vowel duration was both a necessary & sufficient cue to syllable-final voicing. Similar findings were also reported in two subsequent synthesis studies by Raphael et al. (1975, 1980).

Studies which do not Experimental evidence supporting VD consists support vowel duration primarily of a series of synthetic speech studies by Raphael & his colleagues (Raphael, 1972; Raphael et al., 1975, 1980).

Studies have challenged the sufficiency of vowel duration


(Hogan & Rozsypal, 1980; Wardrip-Fruin, 1982a)

Wardrip- Fruin (1982a)


Vowels preceding final voiced stops could be reduced in duration by one-third without eliciting voiceless responses.

Hogan and Rozsypal, 1980


Expanding vowel durations of naturally produced syllables ending in voiceless stops does not result in voiced stop judgments Duration of the vowel nucleus was systematically reduced using a digital gating technique.

Intrinsically long versus intrinsically short

INFLUENCE OF VOWEL CONTEXT ON VOICED/VOICELESS RECOGNITION

Hogan & Rozsypal (1980)


They compared long /i/ and /a/ and short vowels. Vowels with intrinsically short durations such as /I/ and // showed little change in voiced/voiceless recognition curves regardless of vowel duration. Listeners may attribute a low weight to the VD as a voicing cue when intrinsically short durations are used.

At times we need to rely on these cues when VDs are equal.

nd Cue 2
F1 Onset, Steady state and Offset

Why do we need to study different vowel contexts? Experimental evidence supporting VD consists

primarily of a series of synthetic speech studies by Raphael & spectral properties may be Because temporal and his colleagues (Raphael, 1972; Raphael et al., 1975, 1980). weighted differently as a function of steady-state
level, it is important to examine the properties in different vowel contexts (Hillenbrand et al., 1984; Summers, 1988).

Summers, 1988
F1 structure provides information for final-consonant voicing

F1 offset values may play a role in voicing perception only in some vowel contexts.
He examined whether differences in F1 frequency in the initial-transition and steady-state portions of preceding vowels provide perceptual information about postvocalic voicing. Used cascade formant synthesis software (Klatt, 1980)

Summers 1988
Varied the steady-state F1 value of the vowel (, ) in /bVb/ & /bVp/ syllables & found that a lower steady-state F1 yielded more nal [+voice] identication responses . This perceptual experiment followed an earlier production study (Summers , 1987) showing that, before [ +voice] consonants, F1 is lower throughout most of the vowel.

&
Both vowels examined, contained high F1 frequencies.

Series labeled in terms of F1 SS frequency and F1 offset frequency


For each vowel, there were 3 types of series High-high series, which contains high F1 ON + SS frequency and a high F1 offset frequency Low-high series, which contains a low F1 ON + SS frequency and a high F1 offset frequency Low-low series, which contain a low F1 ON SS frequency and a low F 1 offset.

HH- Steeper F1 transition

FI frequency trajectory for the 115 ms member of each stimulus series.

What is more important? Onset/Steady state/ Offset


The HH versus LL series data suggest that judgments of final-consonant voicing were influenced by all 3 cues.

F1 offset frequency differences alone may explain the observed pattern.

HH versus LH (Low onset and SS, High Offset)


This result is not confounded by offset cues. This series had equal F 1 offset frequencies Differed in F1 ON SS frequency and in F 1 final transition slope. Stimuli containing low F1 ON+SS -> Voiced Stimuli from the LH series were identified as being voiced at shorter vowel durations than stimuli from the HH series. These results show the significant effect of F1 ON+SS.

The results do not support F1 offset frequency as a voicing cue because the results of LH versus LL comparison in which offset frequency differences were present were not statistically significant. Low F1 offset frequency voiced High F1 offset frequency voiceless Steady state information may outweigh the onset cues for conveying final voicing. Due to longer length conveys more information.

Learning about
Combination of frequency and intensity characteristics associated with gradual versus abrupt termination of the preceding vowel

F1 offset frequency and intensity decay time.

Offset characteristics of vowels - F1 transition preceding final stop consonants are important to the perception of the voicing (Parker, 1974; Walsh & Parker, 1981, 1983; Walsh, Parker, & Miller, 1987)

Crowther & Mann (1992)

CROSS LINGUISTIC STUDY - ENGLISH


- JAPANESE - MANDARIN

Crowther and Mann (1992)


Why did they choose a low vowel?
A low vowel /o/ was chosen because F1 offset frequency as a final consonant voicing cue is more effective for low vowels than high vowels (Fischer and Ohde, 1990)

Crowther and Mann (1992)


Aim: Investigate the effectiveness of F1 offset frequency as cues to final stop voicing Stimuli: Synthetic CVC stimuli Language: English, Japanese, and Mandarin. Result: F1 offset frequency produced only small perceptual effects

Voiced

Voiceless

Reason
Perceptual use of F1 may be less affected by native language experience or relatively easy to learn. F1 offset may be a universal cue. It may be a language specific cue, that is more easily learnt by non-native speakers. (Studies involving very inexperienced subjects will be needed to verify this)

Hillenbrand et al. (1984)


Stimuli: Naturally produced speech Result: When VD are approximately equal, utterances containing falling F1 formant transitions and low F1 offset frequency were judged as voiced. Vowel context: The effect was greatest for open vowels with a high-F1 steady state Least for more constricted vowels like /i/ and /u/ with low-F1 steady state values.
High-F1 steady state -> Formant transition & offset good cues

Fischer and Ohde (1990)


Purpose of their study was to investigate the effect of vowel context on perception of three cues to voicing in final velar stop consonants: Vowel duration, F1 offset frequency, & Rate off F1 offset frequency change.

Synthetic CVC stimuli. They found that both VD and the frequency of F1 offset affected listeners perception of voicing class.

//

/I/

/i/

//

650 Hz

600 Hz

200 Hz

400 Hz

Vowel context
The influence of the F1 transition-offset frequency on voicing perception appeared to be related to vowel context, specifically, the frequency of F1 steady-state value for a particular vowel.

/i/- constricted vowel


The F1 offset frequency did not have as substantial an effect on voicing perception for the vowel /i/ with - low F1 steady state, suggesting that the spectral cue was less salient for the constricted vowel than for either // or /I/.

Production constraints
Production constraints restrict the extent of frequency change in vowel transition offset relative to vowel steady state for /i/ compared to // and /I/.

In English the extent of the F1 transition is greater for the voiced than for the unvoiced stop consonants (Liberman, Delattre and Cooper, 1959)

Walsh et al. (1987)


Effect of transition rate for vowels ranging between 100-250 ms, and found that faster transition rates elicited perception of voiced sounds.

Faster the rate the consonant sounds voiced

Contrary to Walsh et al. (1987) results, the Fischer and Ohde found that the fastest transition rate of 10 Hz/ms did not necessarily elicit the perception of voiced sounds.

Fischer & Ohde (1990)

2ND STUDY

Rate of frequency change


When F1 onset and offset frequencies were held constant, rate of frequency change was neither a reliable nor an important cue for voicing perception in final velar stop consonants.

Fischer and Ohde (1990)


Transition rate had little effect on voicing perception because differences in voicing ratings for vowels of equal duration were quite small.

The following figure shows the mean rating of final consonant voicing as a function of vowel duration and the change in rate of the F 1 offset transition.

Low vowel // - high-F1 steady-state value High vowel /i/ with a low-F1 steady-state value.

// 500 Hz F1 offset

// 200 Hz F1 offset

/i/ 200 Hz F1 offset

For the // continua with the 500-Hz F1 offset frequency, transition rate did not significantly influence voicing perception at any vowel duration.

The other two vowel continua, both with lower F1 offset frequency values of 200 Hz, demonstrated some influence of transition rate for syllables in the range of 200-300 ms as previously reported by Walsh et al. (1987)

Stimuli with the fastest transition rate generally elicited the lowest voicing ratings.

Krause (1982a)
Krause (1982a) examined development of vowel length as a cue to phonological voicing in post-vocalic stops among children. She synthesized 3 monosyllabic spectral configuration to represent the pairs bip/bib, pot/pod and back/bag. Pot/pod & back/bag contained low F1 transitions. 3 & 6 year old children along with adults. Result: Age incrs shorter VDs were sufficient to change perception from VL to V.

1 group- Always labeled back/bag as bag. 2nd group Bip/ bib stimuli (level F1 transition) as bip Krause concluded for some children, the presence of an F1 offset transition may always cue a voiced consonant and the absence of F1 transition may always cue a VL stop, independent of the VD.

PLACE OF ARTICULATION SPECTRAL CUE

Effect of following vowel tilt on CV perception

VOWEL TILT

The onset spectrum of the stop changes relative to the following vowel

When does the tilt act as a strong cue?


When formant information is ambiguous and intermediate between /ba/ and /da/, spectral tilt information has a substantial influence on identification of the consonant (Lahiri et al, 1984)

Vowel tilts shallower - more positive than the consonant onset tilt are expected to result in more labial responses. Vowel tilts steeper more negative than the consonant onset tilt are expected to result in more alveolar responses. 6 dB/ oct. consonant onset tilt diverged to different vowel tilts.

Twenty-one subjects could identify /ba/ or /da/ in a series of 40 CVs that varied along both F2-onset frequency in eight steps

And along spectral tilt of the following vowel in ve

steps ranging from 12 to 0 dB/octave.

Short-term spectra for the rst four pitch pulses for the stimuli.

Stimuli with an F2-onset frequency = 1400 Hz. In the first panel - more labial responses Second panel - more alveolar responses despite having identical stimulus onset spectra because the change in tilt is different. For the 1st panel, tilt becomes shallower until it reaches a at spectrum that is sustained during the duration of the vowel whereas, for the bottom panel, tilt becomes steeper until it reaches a steeply negative spectrum for the vowel.

The first panel shows a relative attening of spectral tilt (6 to 0 dB/ oct) from consonant onset (t= 0 ms) to vowel steady state (t= 30 ms). This pattern of change is predicted to increase the perception of a labial stop consonant. In contrast, the 2nd panel is predicted to increase the perception of an alveolar stop consonant because spectral tilt becomes steeper over the course of the consonant transition.

Mean data for the experiment in which the probability of responding /da/ as a function of F2-onset frequency is plotted separately for each vowel tilt. Maximum likelihood ts of the identication functions are displayed for the mean data at each vowel tilt as different lines (see the legend).

The experiment demonstrated the relative influence of spectral tilt change as a perceptual cue to stop consonant identification in a CV context without bursts.

..\..\Downloads\II MSaud\Speech perception\Vowel tilt\Alexander & Kluender (2008).pdf

FORMANT TRANSITION

Revoile (1981)

1ST STUDY

Revoile (1981)
Studied transition switched and transition deleted. Switching vowel transition resulted in listeners' perceiving the voicing characteristics of the following stop to be that of the stop in the syllable in which the vowel transition was produced.

Deletion of the VT impaired the overall identification of voicing in the following stop.

The acoustic shape of the formant transition varies as a function of the following vowel. Therefore, the vowel-formant transitions are necessarily context dependant cues for stop consonants.

C:\Users\Giten\Downloads\II MSaud\Speech perception\Transitions\stop consonant recognition.pdf

Sharf and Beiter (2006)


The effect of coarticulation on perception was studied by asking 18 subjects to identify the place of articulation of stop consonants from CV and VC vowel transitions played forward and backward. The factors investigated were: (A) transition direction, forward and backward; (B) transition position, CV and VC; (C) manner of production; (D) place of production.

The results indicate that: with the exception of voiceless stops identified from forward CV transitions, consonants were identified considerably better than chance from CV and VC vowel transitions. More correct identifications of consonants were made from VC transitions than from CV transitions in both the forward and backward play conditions. Backward play CV transitions produced much higher identification scores than those played forward, and

Listeners derive more information from transitions when they are pre-consonantal than when they are postconsonantal.

Task : Subjects (17) had to check the appropriate consonant on a list provided. They studied 1) Transition position (CV/VC) 2) Voiced / Voiceless 3) Place ( alveolar/ labial/ velar) Neutral vowel // was used with each stimulus. Aspiration of the stop release was removed

Subtests
CV transitions voiced stops & voiceless stops VC transitions - voiced stops & voiceless stops

Transition position & Voiced versus VL

Labial > Palatal > Alveolar

Place of production
Labial consonants Transitions from the neutral vowel are neutral or negative, that is the second formant is not higher in frequency than the steady state portion of the vowel. Palatal & alveolar Transitions from the neutral vowel are positive, that is the second formant frequency increases relative to the vowel SS. Negative portion of the formant transition produces fewer confusions than the positive inflections, since the former is indicative of only the labial position.

The discrimination of vowel duration by infants' Rebecca E. Eilers, Dale H. Bull, D. Kimbrough Oiler, and Diana C. Lewis

INFANT PERCEPTION

VD perception in infancy
Three groups of nine 5 to 11-month-old infants provided evidence of discrimination of speech like stimuli differing only in vowel duration. Ease of discrimination was directly related to the magnitude of the ratio of the longer to shorter vowel.

Infants were tested for discrimination of synthetic stimuli differing by 100-, 200-, and 300- ms vowel duration in one-, two-, and three- syllable contexts and on a final position stop voicing contrast cued by voice excitation only.

In all cases, the contrasting durations were carried by the last vowel of the synthetic word.

Group one infants discriminated three vowel duration contrasts (with ratios of 0.33, 0.67, and 1.0) embedded in a synthetic/mad/syllable;

Group two: Discriminated these same duration contrasts within the bisyllable/samad/, Group three: In the trisyllable /masamad/.

House and Fairbanks (1953) showed that voicing of final consonants in English is cued by a 69% increase in VD preceding a voiced consonant. Since infants showed fairly good discrimination of VD increments of 67%, They may be able to make phonemic discriminations of final consonant voicing in English. Conclusion Dominant cue for final consonant voicing relative duration of pre-consonantal vowel.

These same three infant groups failed to provide evidence of discrimination of a final position released stop consonant contrast (/mat/versus /mad/) cued by voice excitation during closure of the/d/and not the /t/.

Thank You

Tense vowels are longer than lax vowels and low vowels are longer than high. http://dspace.vidyanidhi.org.in:8080/dspace/bits tream/2009/1389/7/UOM-1996-041-6.pdf

http://www.freeppt.net/animalbackgrounds.html

Production differences
The vocal folds abduct sooner in the productions with voiceless final stops.

Organization of the whole word differs as a function of the voicing of the final stop. The jaw lowers faster & moves to more open positions in syllables with voiceless, rather than voiced, final stops (Gracco, 1994; Summers, 1987).
In words with voiceless final stops the jaw is also quicker to close (Summers, 1987) The tongue is quicker to move away from its vowel-related posture (Raphael, 1975). For words with voiceless final stops, laryngeal vibration ceases before vocaltract closure is achieved, whereas laryngeal vibration continues into closure for words with voiced final stops. All these articulatory differences create numerous acoustic differences between words with voiced and voiceless final stops: Words with voiceless final stops have shorter portions than words with voiced final stops.

Studies which do not support vowel duration

Fo
A lower steady-state fo and a lower offset fo both increased voiced labeling responses.

Fischer & Ohde (1990)


Low F1 steady state (voiceless) High F1 steady state Less voicing effect More voicing effect (voiced)

Higher F2

Lower F2

Walsh & Parker (1983)


Relative importance of VD & offset cues They proposed that, while vowel offset transition was the primary cue to voicing in word-final stops, vowel duration could override offset cues at extremely long or short vowel lenght.

Vowel length
VD Offset cues

VD

Spectral versus temporal cues


Thus, these findings indicate that spectral cues to voicing can override vowel duration cues.

Fruchter and Sussman, 1997


Effect of Carryover Identification curves were estimated for the English consonants /b,d,g/ using five-formant CV synthetic stimuli comprehensively sampling the F2 onset-F2 vowel acoustic space in the vicinity of /b,d,g/

The stimuli included 10 English vowel contexts, 11 levels of F2 onset per vowel, and 3 levels of F3 onset orthogonally varied with the F2 variables 10 vowels x 11 F2 onsets x 3 F3 onsets = 330 stimuli In order of ascending F2 vowel /, o, a, , u, , , I, e, i/. Task : Subjects were asked to identify each stimulus as most similar to b, d, g, w or no consonant.

Across tokens various parameters were kept constant (stimuli were burstless).

300 ms

2nd formant transition sampled at its onset is F2 Onset Frequency of the second formant sampled in the middle of the following vowel (F2 vowel) for a coarticulated consonant. The results strongly indicate F2 onset and F2 vowel, in combination, serve as important cues for stop consonant place of articulation.

Tradeoff b/w F2 transition and burst


Pattern of tradeoffs between F2 transition cues and burst cues, as described by Dorman et al. (1977), Burst carries most weight in those situations in which the F2 vocalic transitional cues are not really distinctive b versus d in front vowel contexts & d versus g in back vowel contexts

As with the burst,

F3 carries most weight when the F2 vocalic transitional cues are not distinctive.

Effects of F3
There was no effect of the F3 condition on /b/ identification in back vowel contexts and on /g/ identification in front vowel contexts.

While there were effects on /d/ versus /g/ identification in back vowel contexts and on b versus d identifications in front vowel contexts.

Comparing patterns of F3
There are F3 effects in those regions in which there is an overlap between the different stop places of articulation back vowel /b/ and front vowel /g/. There are tradeoff consequences between the overlapping stops in the region of their overlap. /d/ and /g/overlap in back vowel space, /b/ and /d/ in front vowel space. These tradeoffs are in the natural directions, With g-like F3 elevating /g/ versus /d/ identifications, & b-like F3 elevating /b/ versus /d/ identifications.

Sussman et al. (1991)


The stimulus used was initial stops /b, d, g/ followed by the vowel and final consonant was always /t/. Vowels used were / I, i, , , o, , , u, a/

Results of discriminant analysis showing percent classification of place of articulation. Predictor variables were F2 onset and F2 vowel frequencies.

Results of discriminant analysis showing percent classification of place of articulation, across all vowel contexts. Predictor variables were F2 onset and F2 vowel frequencies. Sussman (1991)

Clearly, additional information (F3) is needed to discriminate /d/ versus /g/ in back vowel contexts.

Acoustic Space: F 2 onset x F2 vowel


F2 transition acoustic space, defined by the frequency coordinates F2 onset and F2 vowel. Areas of intersection between stop categories. Patterning of labial, alveolar and velar data. Labials most distinct category especially in back vowel context. Velars front vowel context

Group mean scatterplots

How do transitions vary depending on the place of articulation and the vowel context?

Bilabials
Transitions are longer before unrounded than before rounded vowels.

Apical stops
The distance between the point of occlusion and vowel target configuration varies, so we can expect both devoiced and voiced transitions to be more effective cues to /d/ before back vowels, where transitions are relatively long, than before front vowels, where transitions are short.

Velars
Determining factor is degree of similarity between the velar tongue constriction and that of the following vowel; in general close vowels such as /i/ will have little transition and open vowels /a/ has marked transition.

The results of this study indicated that there was a tendency toward reciprocal performances on bursts and transition;

where the perceptual weight of one increased as the other decreased.

When transitions are brief for /b/ before rounded vowels, for /d/ before high front vowels, & /g/- before close vowels, the burst lies near the major spectral peak of the following vowel and contributes significantly to the perceptual outcome.

Where transitions are extensive, for /b/ before middle, unrounded vowels, for /d/ before central-back vowels, the burst is distinctive from the major spectral peak of the following vowel and contributes little to the perceptual outcome.

Das könnte Ihnen auch gefallen