Sie sind auf Seite 1von 5

Voice Morphing

Voice morphing means the transition of one speech signal


into another. Voice Morphing which is also referred to as voice
transformation and voice conversion is a technique to modify a
source speaker's speech utterance to sound as if it was spoken
by a target speaker.

The core process in a voice morphing system is the


transformation of the spectral envelope of the source speaker
to match that of the target speaker and linear transformations
estimated from time-aligned parallel training data are
commonly used to achieve this.

Applications
{some of the applications }
1. Text To Speech (TTS)

2. In public speech systems

3. For special effects ( just like video or image morphing is


done )

4. To diminish Ethnical barriers.

{Focus on TTS the most………}

Text To Speech (TTS):-

A text-to-speech (TTS) system converts normal language


text into speech; other systems render symbolic linguistic
representations like phonetic transcriptions into speech.

Synthesized speech can be created by concatenating


pieces of recorded speech that are stored in a database.
{ That is we keep a database of the different phonetics and
substitute the ones that correspond in our text }

Systems differ in the size of the stored speech units; a


system that stores phones or diphones provides the largest
output range, but may lack clarity.

{ ‘phones’ and ‘diphones’ are phonetic terms here… phone


means an individual phonetic element, while diphones is an
adjacent pair of phones . Google for more info…}

For specific usage domains, the storage of entire words or


sentences allows for high-quality output. Alternatively, a
synthesizer can incorporate a model of the vocal tract and
other human voice characteristics to create a completely
"synthetic" voice output.
Fig - List of different TTS systems

Public Speech systems:


In public speech systems we can make the sound to be of a
popular public speaker.

{ there are a lot of advantages for this…… like

1. The public speaker doesn’t need to be physically


present

2. We can implement that in many places (in fact railway


announcement uses a very crude form of the same
idea)

3. Cost efficiency

Special Effects

Video and Image morphing is extensively used for film and


graphical special effects. Similarly we can increase the
multimedia animation experience by simultaneously morphing
the images/video while doing the audio also.

To diminish Ethnical barriers

Through voice morphing, we can give accent corrections and


even translations!!!!!!!

{ That is a German engineer can instruct a Chinese


workman, an American caller can understand an Indian call
center guy better, etc)

The Ethical barriers and small talks hugely hinder an


effective communication. Thus through the voice morphing we
can improve the communication and thus ultimately the
through-put.

Limitations
1. The voice detection is done via sophisticated 3d
renderings but this there are a lot of normalizing
problems { that is extracting the meaning /
understanding the sound is difficult }
2. Some applications require extensive sound
libraries.
3. The different languages require different phonetics
and thus updating or extending is tedious.
4. It is very seldom complete { we may not be able
add every small-talk, every phonetics into the
database.}

Das könnte Ihnen auch gefallen