Sie sind auf Seite 1von 8

PTFonR07:PTFonR07 2008-05-06 15:50 Strona 79

Sentence Intonation for Polish Language


Prozodia wypowiedzi w języku polskim

Bożena Piorkówska, Janusz Rafałko,


Wojciech Lesiński, Edward Szpilewski

Institute of Computer Sciences, University of Bialystok, Bialystok, Poland


boncia@wp.pl

ABSTRACT
The article presents tests results of examination of sentence intonation in Polish. The
tests were performed for the project “Development of multi-voice and multi-language
Text-to-Speech (TTS) and Speech-to-Text (STT) conversations system (language:
Belarussian, Polish, Russian)”. A short introduction to prosody with particular stress
on syntax (simple and compound sentence structure). The examined material was
recordings of a text read aloud by four different people (two women and two men).
The article also presents the process of analysis and future plans.

STRESZCZENIE
Artykuł prezentuje wyniki badań intonacji zdaniowej dla języka polskiego przepro-
wadzonych na potrzeby projektu „Syntezer mowy polskiej na podstawie tekstu”.
Krótkie wprowadzenie do problematyki prozodii, ze szczególnym uwzględnieniem
składni, czyli budowy zdania pojedynczego i złożonego. Materiałem badawczym
były nagrania tekstu czytanego przez cztery różne osoby (dwie kobiety i dwóch męż-
czyzn). W artykule przedstawiono również sposób przeprowadzania analizy oraz
kierunki dalszej pracy.

1. Introduction
The research aims to fill the gap in introducing and promoting computerised
speech technology for Polish language. The decisive factor in achieving high quality
of speech synthesis is the completeness of the resources and databases used. The
research objective is to develop the linguistic resources, vocabulary, grammar and
acoustical databases. The synthesis of phonemic characteristics of speech is based on
the Allophones Natural Waves method. The basic principle of synthesising the
prosodic features of speech is the division of an utterance into accent groups and the
formation on their basis of entire tonal, rhythmical and dynamic contours of a syntagm
and utterance as a whole. By using Data Driven approach the speech synthesiser will
resort to prosodic feature databases for the synthesis of speech sounds and intonation.
The two modules are expected to achieve a high quality of synthesised speech.
In order for the synthesed speech to sound natural it needs to have rhythm and, what is
more important, proper intonation. The way people speak differs depending on the ability
to produce utterances in a particular way. The voice signal is described using numerous

79
PTFonR07:PTFonR07 2008-05-06 15:50 Strona 80

80 Speech and Language Technology. Volume 9/10

physical parameters which vary as the speech goes on. The acoustic parameter that is
particularly important in prosodic analysis is the basic frequency (F0) and its curve
(intonation contour). That is why special emphasis is placed on determining the intonation
contour and F0 maximum and minimum values for particular types of utterances.

2. Kind of utterances

Syntax (the sentence structure and message) is crucial in certain utterances intonation.
One of the divisions of utterances is into clauses and phrases. Phrases are groups
of words that have either no subject or no predicate e. g. Przechodzić tylko na zielonym
świetle. A sentence is a group of grammatically interrelated words containing a subject
and a predicate e. g. Proszę przechodzić tylko na zielonym świetle. Sentences with
modifiers are called long simple sentences, whereas the ones without – short simple
sentences. A simple sentence can be as short as one word. Longer utterances with more
than one predicate and/or subject are compound sentences. Based on relations between
these elements compound and complex sentences are distinguished.
There are several types of compound sentences:
1. Conjunction – the sentences are joined; The co-ordinating conjunctions may be: i,
oraz, a, jak również, ani itp. (and, as well as etc.);
e. g. Marek był w górach a Ania nad morzem (Mark was in the mountains and Ann
was by the seaside);
2. Negation – one sentence negates the other; The co-ordinating conjunctions may be:
ale, lecz, a, jednak, zaś, natomiast itp. (but, however etc.);
e. g. Kasia była spóźniona jednak się nie śpieszyła (Cathy was late but she was not in
a hurry);
3. Disjunction – one sentence excludes the other; The co-ordinating conjunctions may
be: albo, czy, lub, bądź itp. (or, either etc.);
e. g. Przeczytam książkę albo pójdę do kina (I will read a book or go to the cinema);
4. Implication – the second sentence is a consequence of the first; The co-ordinating
conjunctions may be: więc, toteż, dlatego, zatem itp. (so, that is why, consequently etc.);
e. g. Marek jest zdolny więc ma wysoką średnią (Mark is intelligent so his average is
high).
In complex sentences the components are not equal. In such sentences there is a main
clause (antecedent) and a subordinate clause (consequent). The subordinate clause
substitutes or complements one of the main clause parts.
Depending on the purpose and emotional undertone we divide sentences into de-
clarative, interrogative, imperative and exclamatory ones. To express emotions we
usually use punctuation marks like: dash, question mark, exclamation mark or ellipsis.

3. Research method

The very first step was creating the proper text to be recorder. It had to convey the
examined types of sentences. It consisted of several sentences of each examined type.
Besides that, the sentences differed from each other in conjunctions and message.
PTFonR07:PTFonR07 2008-05-06 15:50 Strona 81

Sentence Intonation for Polish Language 81

Picture 1. Spectogram of sentence read by a man.

Picture 2. Spectogram of sentence read by a woman.


PTFonR07:PTFonR07 2008-05-06 15:50 Strona 82

82 Speech and Language Technology. Volume 9/10

Picture 3. Tone contour of sentence read by women (two above – female)


and men (two below – male).

Next, using SoundForge, as the four different people read the text aloud, it was re-
corded. Each person recorded his/her text at a time so as not to suggest intonation to
the other participants of the research. For every sentence (using Praat – a computer
program with which phoneticians can analyse, synthesize, and manipulate speech) a
spectrogram (spectro-temporal representation of the sound) and an intonation contour
were generated. Computer analysis gave some important data considering F0 fluc-
tuation and its mean value. Tone contour examination also brought interesting results.
Spectrograms and tone contours of sentences “Maciek był w górach a Ania nad mo-
rzem” (Mark was in the mountains and Ann was by the seaside) read by a man and a
woman are presented below. It is clearly seen that the man has a low-pitched voice –
F0 maximum value is about 140Hz, whereas the woman’s – 243Hz. The woman’s
intonation line is similar to the man’s line. They only differ in pitch variation which is
normal as women have higher voice than men.

4. Test results

4.1 Acoustic parameters fluctuation


What greatly influences the basic frequency F0 is the pitch (high or low) of a
speaker’s voice. Generally the higher the basic tone the speaker has the greater the
range between Fmin and Fmax. It is presented in the figure below. There are minimum
and maximum frequencies of sample compound negation sentences. The same
utterances of every person were chosen.
The message of the utterance is of lesser importance. The table below presents
minimum, maximum and mean value frequencies in different utterances. The type of
PTFonR07:PTFonR07 2008-05-06 15:50 Strona 83

Sentence Intonation for Polish Language 83

Picture 4. Minimum and maximum frequencies of sample negation sentences.

an utterance needs special attention. Exclamatory and interrogative sentences have


broader frequency range. Generally the highest values of Fmax are in exclamatory and
imperative sentences.
Tone contour observations showed that the way the participants produce their
sentences does not differ significantly in its diagram. It is clearly seen in fig.1.4. It

Table 1. Minimum, maximum and mean value frequencies in different utterances

Woman_1 Wonam_2 Man_1 Man_2


Fmin Fmax Favr Fmin Fmax Favr Fmin Fmax Favr Fmin Fmax Favr
conjunction 176 265 231 133 236 206 127 212 154 93 124 110
sentences 191 275 237 157 235 201 126 213 162 93 123 110
disjunction 180 292 245 160 244 206 119 204 156 89 135 112
sentences 170 288 243 161 253 213 136 200 170 93 126 111
negation 224 277 244 136 288 211 125 233 167 103 136 112
sentences 208 285 244 174 252 211 133 197 177 132 197 107
complex 179 273 236 139 256 207 132 182 161 92 123 110
sentences 166 287 242 156 262 214 142 197 171 100 136 113
implication 204 285 241 110 267 157 129 227 168 99 138 111
sentences 177 276 238 166 253 202 134 219 181 105 177 113
exclamatory 199 298 275 195 292 261 146 273 214 104 194 162
sentences 137 297 263 171 247 221 126 225 176 102 200 166
interrogative 175 282 175 110 266 223 103 209 103 106 173 106
sentences 233 277 233 182 256 223 146 200 146 130 146 130
declarative 126 266 222 175 271 234 127 223 171 102 144 122
sentences 183 269 223 158 274 211 113 225 160 108 155 124
imperative 190 286 247 167 284 238 136 219 181 111 161 142
sentences 194 282 235 182 290 251 125 206 159 103 174 150

presents the same compound sentence intonation lines produces by every one of the
participants. The two above – women’s, the two below – men’s.
There were differences concerning accent intensity of a particular word in a
sentence. The message and the speaker’s interpretation were crucial here. The figure
PTFonR07:PTFonR07 2008-05-06 15:50 Strona 84

84 Speech and Language Technology. Volume 9/10

Picture 5. Tone contour of compound sentence read by women (two above – female)
and men (two below – male).

below presents tone contour for the sentence “Jutro pojadę na wycieczkę, albo zostanę
w domu” (I will go on a trip tomorrow or stay at home). Three of the speakers decided
that when they go was more important and one that the very fact of going on a trip was
significant.

Picture 6. Tone contour of sentence “Jutro pojadę na wycieczkę, albo zostanę w domu”
(I will go on a trip tomorrow or stay at home) read by women (two above – female)
and men (two below – male).
PTFonR07:PTFonR07 2008-05-06 15:50 Strona 85

Sentence Intonation for Polish Language 85

4.2. Syntax analysis


Various compound sentences intonation lines comparison failed to show any
significant differences. Of course there were utterances whose tone contour was not
like the general model, but, as it was previously mentioned, it was due to the inter-
pretation.
Such utterance contour consists of two rise-and-fall parts. Usually the accented
words belong to the subject and here the tone contour is high. When the conjunction
appears, there is again a rise.
Complex sentences have different F0 frequency graph. Regardless of the sentence
construction – main clause before the subordinate one or vice versa – the tone contour
falls.
Interrogative sentences have rising intonation line, but the strongest rise occurs in
the last words. In imperative and exclamatory sentences the intonation line first rises
sharply, then falls. In declarative sentences the tone contour rises and falls, but these do
not happen as abruptly as in the previous ones. The figure below shows interrogative,
exclamatory, imperative and declarative sentences intonation lines.

5. Conclusion

The correct and natural use of intonation is very difficult to accomplish. A person,
pronouncing a sentence, knows exactly what he is trying to say and knows the meaning
of words he uses. A lot of information is communicated through the accurate prosody
of the spoken text. To get the best results, not only the sentence construction should be
analysed, but also the meaning and layout of its words. Because of this such tests are
crucial in order to obtain the best possible quality of synthesed speech. The notion of
intonation and eurhythmics of particular types of utterances will make natural speech
generation possible. In the future, such synthesiser will surely be commonly used.
The expected results can be applied in further research in applied linguistics,
especially, in the study of phonetics and prosody of the Polish language, in expanding
the theoretical framework for multilingual speech communication systems. The project
has great relevance for economic and social fields. The obtained results will facilitate
the development of new areas of business activities and services in Poland which are
connected with the creation of speech synthesiser. The speech synthesiser can be used
in audio servers to provide information to the users in telephone banking, cultural and
tourist information telephone services, makes possible a round-the-clock telephone
transmission of required information by means of speech; the on-line telephone infor-
mation services. One of the possible applications of the synthesiser is the socially
oriented system, such as a computer-based transmission of textual information be
means of voice to the sick, socially disabled and for the blind.
The extension of this work is a project of executing an opposite process which is
speech recognising and notation in the form of text. Conversion of the speech information
into text: Speech-to-Text (STT). Recognition and synthesis methods of the audio-visual
patterns will be developed.
PTFonR07:PTFonR07 2008-05-06 15:50 Strona 86

86 Speech and Language Technology. Volume 9/10

Acnowledgement

This paper was supported by the EUROPEAN COMMISSIN under grant INTAS
Ref. number 04-77-7404. The author wish to express their thanks for the support.

REFERENCES
0[1] Lobanov B., Karnevskaya H. 1991. MW Speech Synthesis from Text. Aix-en-Provense, France:
Proc. of the XII International Congress of Phonetic Sciences.
0[2] Shpilewski E., Piorkowska B, Rafalko J., Lobanov B, Kiselov V., Tsirulnik L. 2004. Polish TTS
in Multi-Voice Slavonic Languages Speech Synthesis System. Saint Petersburg: Proceedings of
the 9th International Conference “Speech and Computer” – SPECOM’2004.
0[3] Boguslavsky I., Lobanov B. and Karnevskaya H. 1996. Generation of Intonation and Accentuation
of SyntheticSpeech on the Base of Morpho-Syntactic Knowledge. Moscow: Proceedings of the
International Workshop “Integration of Language and Speech”.
0[4] Piorkowska B., Rafalko J., Shpilewski E. 2005. Conversion of Textual Information to Speech for
Polish Language. Wroclaw: Proceedings of the 4th International Conference on Computer Re-
cogniotion Systems – CORES’2005.
0[5] Lobanov B., Piorkowska B., Rafalko J., Tsirulnik L. 2005. Implementation of Interlanguage
Differences of Completeness and Incompleteness Prosody Types in Russian and Polish TTS.
Moscow: Proceedings of the International Conference Dialog-2005.

Das könnte Ihnen auch gefallen