Beruflich Dokumente
Kultur Dokumente
ABSTRACT
The article presents tests results of examination of sentence intonation in Polish. The
tests were performed for the project “Development of multi-voice and multi-language
Text-to-Speech (TTS) and Speech-to-Text (STT) conversations system (language:
Belarussian, Polish, Russian)”. A short introduction to prosody with particular stress
on syntax (simple and compound sentence structure). The examined material was
recordings of a text read aloud by four different people (two women and two men).
The article also presents the process of analysis and future plans.
STRESZCZENIE
Artykuł prezentuje wyniki badań intonacji zdaniowej dla języka polskiego przepro-
wadzonych na potrzeby projektu „Syntezer mowy polskiej na podstawie tekstu”.
Krótkie wprowadzenie do problematyki prozodii, ze szczególnym uwzględnieniem
składni, czyli budowy zdania pojedynczego i złożonego. Materiałem badawczym
były nagrania tekstu czytanego przez cztery różne osoby (dwie kobiety i dwóch męż-
czyzn). W artykule przedstawiono również sposób przeprowadzania analizy oraz
kierunki dalszej pracy.
1. Introduction
The research aims to fill the gap in introducing and promoting computerised
speech technology for Polish language. The decisive factor in achieving high quality
of speech synthesis is the completeness of the resources and databases used. The
research objective is to develop the linguistic resources, vocabulary, grammar and
acoustical databases. The synthesis of phonemic characteristics of speech is based on
the Allophones Natural Waves method. The basic principle of synthesising the
prosodic features of speech is the division of an utterance into accent groups and the
formation on their basis of entire tonal, rhythmical and dynamic contours of a syntagm
and utterance as a whole. By using Data Driven approach the speech synthesiser will
resort to prosodic feature databases for the synthesis of speech sounds and intonation.
The two modules are expected to achieve a high quality of synthesised speech.
In order for the synthesed speech to sound natural it needs to have rhythm and, what is
more important, proper intonation. The way people speak differs depending on the ability
to produce utterances in a particular way. The voice signal is described using numerous
79
PTFonR07:PTFonR07 2008-05-06 15:50 Strona 80
physical parameters which vary as the speech goes on. The acoustic parameter that is
particularly important in prosodic analysis is the basic frequency (F0) and its curve
(intonation contour). That is why special emphasis is placed on determining the intonation
contour and F0 maximum and minimum values for particular types of utterances.
2. Kind of utterances
Syntax (the sentence structure and message) is crucial in certain utterances intonation.
One of the divisions of utterances is into clauses and phrases. Phrases are groups
of words that have either no subject or no predicate e. g. Przechodzić tylko na zielonym
świetle. A sentence is a group of grammatically interrelated words containing a subject
and a predicate e. g. Proszę przechodzić tylko na zielonym świetle. Sentences with
modifiers are called long simple sentences, whereas the ones without – short simple
sentences. A simple sentence can be as short as one word. Longer utterances with more
than one predicate and/or subject are compound sentences. Based on relations between
these elements compound and complex sentences are distinguished.
There are several types of compound sentences:
1. Conjunction – the sentences are joined; The co-ordinating conjunctions may be: i,
oraz, a, jak również, ani itp. (and, as well as etc.);
e. g. Marek był w górach a Ania nad morzem (Mark was in the mountains and Ann
was by the seaside);
2. Negation – one sentence negates the other; The co-ordinating conjunctions may be:
ale, lecz, a, jednak, zaś, natomiast itp. (but, however etc.);
e. g. Kasia była spóźniona jednak się nie śpieszyła (Cathy was late but she was not in
a hurry);
3. Disjunction – one sentence excludes the other; The co-ordinating conjunctions may
be: albo, czy, lub, bądź itp. (or, either etc.);
e. g. Przeczytam książkę albo pójdę do kina (I will read a book or go to the cinema);
4. Implication – the second sentence is a consequence of the first; The co-ordinating
conjunctions may be: więc, toteż, dlatego, zatem itp. (so, that is why, consequently etc.);
e. g. Marek jest zdolny więc ma wysoką średnią (Mark is intelligent so his average is
high).
In complex sentences the components are not equal. In such sentences there is a main
clause (antecedent) and a subordinate clause (consequent). The subordinate clause
substitutes or complements one of the main clause parts.
Depending on the purpose and emotional undertone we divide sentences into de-
clarative, interrogative, imperative and exclamatory ones. To express emotions we
usually use punctuation marks like: dash, question mark, exclamation mark or ellipsis.
3. Research method
The very first step was creating the proper text to be recorder. It had to convey the
examined types of sentences. It consisted of several sentences of each examined type.
Besides that, the sentences differed from each other in conjunctions and message.
PTFonR07:PTFonR07 2008-05-06 15:50 Strona 81
Next, using SoundForge, as the four different people read the text aloud, it was re-
corded. Each person recorded his/her text at a time so as not to suggest intonation to
the other participants of the research. For every sentence (using Praat – a computer
program with which phoneticians can analyse, synthesize, and manipulate speech) a
spectrogram (spectro-temporal representation of the sound) and an intonation contour
were generated. Computer analysis gave some important data considering F0 fluc-
tuation and its mean value. Tone contour examination also brought interesting results.
Spectrograms and tone contours of sentences “Maciek był w górach a Ania nad mo-
rzem” (Mark was in the mountains and Ann was by the seaside) read by a man and a
woman are presented below. It is clearly seen that the man has a low-pitched voice –
F0 maximum value is about 140Hz, whereas the woman’s – 243Hz. The woman’s
intonation line is similar to the man’s line. They only differ in pitch variation which is
normal as women have higher voice than men.
4. Test results
presents the same compound sentence intonation lines produces by every one of the
participants. The two above – women’s, the two below – men’s.
There were differences concerning accent intensity of a particular word in a
sentence. The message and the speaker’s interpretation were crucial here. The figure
PTFonR07:PTFonR07 2008-05-06 15:50 Strona 84
Picture 5. Tone contour of compound sentence read by women (two above – female)
and men (two below – male).
below presents tone contour for the sentence “Jutro pojadę na wycieczkę, albo zostanę
w domu” (I will go on a trip tomorrow or stay at home). Three of the speakers decided
that when they go was more important and one that the very fact of going on a trip was
significant.
Picture 6. Tone contour of sentence “Jutro pojadę na wycieczkę, albo zostanę w domu”
(I will go on a trip tomorrow or stay at home) read by women (two above – female)
and men (two below – male).
PTFonR07:PTFonR07 2008-05-06 15:50 Strona 85
5. Conclusion
The correct and natural use of intonation is very difficult to accomplish. A person,
pronouncing a sentence, knows exactly what he is trying to say and knows the meaning
of words he uses. A lot of information is communicated through the accurate prosody
of the spoken text. To get the best results, not only the sentence construction should be
analysed, but also the meaning and layout of its words. Because of this such tests are
crucial in order to obtain the best possible quality of synthesed speech. The notion of
intonation and eurhythmics of particular types of utterances will make natural speech
generation possible. In the future, such synthesiser will surely be commonly used.
The expected results can be applied in further research in applied linguistics,
especially, in the study of phonetics and prosody of the Polish language, in expanding
the theoretical framework for multilingual speech communication systems. The project
has great relevance for economic and social fields. The obtained results will facilitate
the development of new areas of business activities and services in Poland which are
connected with the creation of speech synthesiser. The speech synthesiser can be used
in audio servers to provide information to the users in telephone banking, cultural and
tourist information telephone services, makes possible a round-the-clock telephone
transmission of required information by means of speech; the on-line telephone infor-
mation services. One of the possible applications of the synthesiser is the socially
oriented system, such as a computer-based transmission of textual information be
means of voice to the sick, socially disabled and for the blind.
The extension of this work is a project of executing an opposite process which is
speech recognising and notation in the form of text. Conversion of the speech information
into text: Speech-to-Text (STT). Recognition and synthesis methods of the audio-visual
patterns will be developed.
PTFonR07:PTFonR07 2008-05-06 15:50 Strona 86
Acnowledgement
This paper was supported by the EUROPEAN COMMISSIN under grant INTAS
Ref. number 04-77-7404. The author wish to express their thanks for the support.
REFERENCES
0[1] Lobanov B., Karnevskaya H. 1991. MW Speech Synthesis from Text. Aix-en-Provense, France:
Proc. of the XII International Congress of Phonetic Sciences.
0[2] Shpilewski E., Piorkowska B, Rafalko J., Lobanov B, Kiselov V., Tsirulnik L. 2004. Polish TTS
in Multi-Voice Slavonic Languages Speech Synthesis System. Saint Petersburg: Proceedings of
the 9th International Conference “Speech and Computer” – SPECOM’2004.
0[3] Boguslavsky I., Lobanov B. and Karnevskaya H. 1996. Generation of Intonation and Accentuation
of SyntheticSpeech on the Base of Morpho-Syntactic Knowledge. Moscow: Proceedings of the
International Workshop “Integration of Language and Speech”.
0[4] Piorkowska B., Rafalko J., Shpilewski E. 2005. Conversion of Textual Information to Speech for
Polish Language. Wroclaw: Proceedings of the 4th International Conference on Computer Re-
cogniotion Systems – CORES’2005.
0[5] Lobanov B., Piorkowska B., Rafalko J., Tsirulnik L. 2005. Implementation of Interlanguage
Differences of Completeness and Incompleteness Prosody Types in Russian and Polish TTS.
Moscow: Proceedings of the International Conference Dialog-2005.