Beruflich Dokumente
Kultur Dokumente
2 Department
Abstract This paper describes the construction and development of Tamil Text-To-Speech (TTS)
synthesis system, by combining syllable and diphone units. Initially, a phone based TTS is made.
Later, a monosyllabic word cluster unit TTS is made. It is determined that the standard of the
synthesized sentences will improve if polysyllabic word units are used (when the suitable units are
available), since the consequences of co-articulation are going to be preserved in such a case. Hence,
we tend to engineered Tamil TTS with syllabic units, which contain cluster units of quite one sort
(monosyllable, bi-syllable and tri-syllable). However Polysyllable alone failed to bring the TTS
betterment in following areas like sentence termination, Scientific notations, website link and email
addresses, where the lagging fields can be effectively processed by units of diphone. This research is
bringing utmost effective concatenation of prosody through combining syllables and diphone. The
words (n-1) of sentence can be processed by polysyllable for more perfection, whereas the (nth) end of
sentences, scientific notations, web site link and email address can be effectively processed by
diphones. Implementation of both syllables and diphones in Tamil speech synthesis need two different
corpus tables. Preliminary listening tests indicated that the combination of syllabic word and diphone
TTS has higher quality.
Diphones
comparatively
Keywords: Speech synthesis system, Text to speech (TTS),
Prosody,are
Diphone,
Syllable. bigger units than
1. Introduction:
phones. There are about thousand to two
thousand diphones found in Tamil language.
There are numerous techniques for the
Unlike phones, they do not have allophonic
improvement of a speech synthesis system. One
variations. i.e., each diphone has only one
of the known approaches for speech recognition
instance of pronunciation. Diphone concatenation
is that the Hidden Markov Model (HMM). The
can produce a reasonable quality speech. A single
phoneme types, syllable patterns, and inflectional
example of each diphone is not enough to
characteristics of a language decide the kind of
produce good quality speech. Moreover, diphonethe technique to be used for synthesis. The unique
based synthesizers need elaborate prosody rules
characteristics of a language are analyzed from
to produce natural speech. Diphones cannot
the order of prevalence of phonemes, syllable
capture co-articulation better than recent
patterns and words that comprise the language. A
methodologies. As given in the [4] existing
statistical approach is needed for selecting which
papers concatenation points are comparatively
class of units to be used for Speech Synthesis.
more, so it needs large size of database to store
The statistical language model helps to detect the
corpus data.
existence of phones, syllable patterns and words
in Tamil language.
3. Issues found in Syllables:
2. Issues found in diphones
A diphone is defined as two connected half
phones and describes the transition between two
phones by starting in the middle of the first phone
and ending in the middle of the second phone. It
describes the coarticulation effects and minimizes
the discontinuities at the concatenation points.
6. Proposed system:
DOI:10.1109/NCC.2011.5734737,
proceedings 2011.
IEEE