Sie sind auf Seite 1von 6

The 6th Balkan Region Conference on Engineering and Business Education & Sibiu,

The 5th International Conference on Engineering and Business Education & Romania,
DOI 10.2478/cplbu-2014-0092 The 4th International Conference on Innovation and Entrepreneurship October, 18th – 21st, 2012

TEST OF VOWELS IN SPEECH RECOGNITION USING CONTINUOUS DENSITY


HIDDEN MARKOV MODEL AND DEVELOPMENT OF PHONETICALLY
BALANCED-WORDS IN THE FILIPINO LANGUAGE

Arnel C., Fajardo1 and Yoon-joong, Kim2


1 Hanbat National University / La Consolacion College Manila, acfajardo2000@yahoo.com
2 Hanbat National University, yjkim@hanbat.ac.kr

An Automatic Speech Recognition (ASR) converts the speech signals into words. The recognized words can be the final output or it
can be an input for a natural language processing. In this paper, vowel recognizer using Continuous density HMM and Mel-
Frequency Cepstral Coefficient (MFCC) were used for feature extraction for its development, and phonetically balanced words
(PBW) in Filipino were developed. Thus, this study is a preparation for Filipino Language ASR using HMM. For vowel recognizer,
forty speakers were trained (20 male and 20 female speakers). An average accuracy rate of 94.5% was achieved for speaker-
dependent test and 90.8% for speaker independent test. For PBW, 2 word lists were developed consisting of 257 words for the 2-
syllable Filipino PBW word list and 212 words for the 3-syllable Filipino PBW word list.

Key words: Continuous Density Hidden Markov Model, Filipino Vowels and Phonemes, Phonetically Balanced Words.

1. INTRODUCTION as study in 2010 was held where in an Indonesian speech


corpus was incorporated for the recognizer as training sets to
Speech is one of the most effective means of communication recognize Filipino utterances [5]. The Indonesian speech
that is acquired as one of the first skill a human acquires corpus contains 80 hours of recording compared while the
through interaction with its environment. In the past decades, developed Filipino speech corpus in 2003 contains 4 hours of
computer scientists as well as linguists have been researching recording. This cross-lingual approach achieved 79.50%
effective means of recognizing speech through automated recognition accuracy.
machines.
In this effort, the researchers’ objective is to (1) prove the
efficiency of HMM for recognizing Filipino words using a
Automatic speech recognition is the process of decoding
Mel-frequency Cepstral Coefficient approach, (2) and create a
speech into its corresponding sequence of words. It has been
phonetically balanced word list of commonly used words in the
an effort of Filipino researchers to provide an accurate speech
Filipino Language.
recognizer throughout the past years [1][2]. However, not one
provides an efficient solution for the Filipino Language.
2. FILIPINO LANGUAGE
The Hidden Markov Model (HMM) is a doubly stochastic
process with one that is not directly observable [3]. This hidden Filipino is the language used largely in the Philippines with 22
process can be observed only through another set of stochastic million native speakers [6].
process that can produce the observation sequence.
Between the 1930s and mid-1970’s, a system of syllabication
HMMs are so far the widely used acoustic model for speech for the alphabet called abakada was developed by Lope K.
recognition [4]. This model is used from previous studies Santos to represent the native sounds[7]:
relating to an Automatic Speech recognizer for the Filipino
Language [1][2]. a ba ka da e ga ha I la ma na nga o
pa ra sa ta u wa ya

In 2003, an ASR for Filipino phonemes was developed [1]. to represent the Filipino alphabet, consisting of 5 (a , e, I, o, u)
This study reported to have achieved a recognition accuracy of vowels and 15 (b, k, d, g, h, l, m, n, ng, p, r, s, t, w, y)
85.5%. However, this recognizer was incorporated for consonants. The Filipino alphabet, though in a sense,
phoneme utterances using discrete HMM rather than considered as phonetic, does not reflect exactly the correct
continuous word recognition. sound in written form [8]. There are words present in the
Filipino Language that are spelled the same but are pronounced
Dela Roca G., et.al (2003) attempted to recognize continuous with slight differences, which produce difference in meaning.
speech using a developed Filipino Speech Corpus by Guevara
R., et. al (2002) that reported to achieve only a 32% bata /b:a - ta/ - “a child”
recognition accuracy. In an attempt to increase this accuracy, bata /ba - ta/ - “to bear or endure”
*****
The word bata with the phonetic representation of /b:a – ta/
denotes a long sound which is produced by a short pause after
3.1. Recording Specifications
the affected syllable, while the other phonetic representation
/ba-ta/ is produced continuously without breaks. The recordings were done in an isolated room using a uni-
directional microphone connected to a computer with input
Thus, the Filipino phonemes can be broken down to the speech sampled at 16 kHz at mono. A distance of
following phones: approximately 5-10 centimetres is used between the mouth of
the speakers and the microphone used.
Vowels
/a/ /e/ /i/ /o/ /u/
3.2. Recorded Speech
Consonants The speakers were asked to utter Filipino phonemes: /a/, /e/, /i/,
/b/ /k/ /d/ /g/ /h/ /l/ /m/ /n/ /ŋ/ /p/ /r/ /s/ /t/ /w/ /y/ /o/, and /u/. The speeches collected were used as training data,
and test data.
Table 1: The Filipino Vowel System The training data are gathered from 20 female and 20 male
front central back speakers, which recorded 4 sets of vowel utterances. The test
data were grouped as ‘speaker dependent’ and ‘speaker
upper
independent’. The speaker dependent speech were taken from
high /i/ /u/ the same speakers from the training data (20 female and 20
lower male), which recorded another set of vowel utterances while
the speaker independent speech were taken from 5 female and
upper 5 male speakers not included in the training data which
high /e/ /o/ recorded a set of vowel utterances.
lower
The recordings conventionally named with the format “A1
001.wav” where:
upper
high /a/ A – phoneme (a/e/i/o/u)
lower 1 – speaker’s gender (1 male, 0 female)
001 – number of recorded set
The Filipino vowel phonemes can be described as /a/ low
central unrounded, /e/ mid front unrounded, /i/ high front
unrounded, /o/ mid back rounded, and /u/ high back rounded. 3.3. Hidden Markov Model Toolkit (HTK)
According to tongue height, we have two front vowel The recorded speech data were converted into a Mel-frequency
phonemes /i e/ , and two back vowel phoneme /o u/, and one Cepstral Coefficient (MFCC) that is used to train a prototype
central vowel phoneme /a/. HMM using HTK tools.
Table 2: The Filipino Consonant System A percentage were taken from the recognized vowel utterance
labial dental alveolar palatal velar glottal by HTK and shown as follows to define the accuracy of
voiced recognition:
stops voiceless
/p/ /t/ /k/
nasals voiced /b/ /d/ /g/ Table 3: Accuracy rate taken from the test data of
dependent speakers.
fricatives voiceless /n/ /η/
affricatives voiceless /s/ /h/ Male Female Average
lateral voiced /l/ Vowels (Dependent) (Dependent)
20 speakers 20 speakers Results
flap voiced /r/ Accuracy rate ( %) Accuracy rate ( %)
glide voiced /w/ /y/ A 100 100 100
The Filipino consonants are produced through the help of the E 100 85 92.5
lips (labial), teeth (dental), alveolar ridge (alveolar), palate I 100 95 97.5
(palatal), velum (velar), and glottis (glottal). O 95 85 90
U 95 90 92.5
Results 98 91 94.5
3. TEST OF VOWEL RECOGNITION Table 4: Accuracy rate taken from the test data of
Filipino native speakers were selected for voice recording independent speakers.
sessions. 50 speakers were asked to record their voices, 25 of
Male Female Average
which are female and 25 male. These speakers are gathered Vowels (Independent) (Independent)
from the undergraduate students, and faculty members of the 5 speakers 5 speakers Results
School of Information Technology at La Consolacion College Accuracy rate ( %) Accuracy rate ( %)
Manila, Philippines. A 100 100 100
E 100 76 88
The speakers have Filipino as their first language and are able I 100 64 82
to read and speak Filipino. The speakers are fluent with the O 96 100 98
language, and has no speaking ailment and at their proper U 72 100 86
dispositions. Results 93.6 88 90.8
*****
The accuracy rate of recognition of the Filipino phonemes had Tinawid Ni Pilandok Ang Ilog 227
a slight decrease from the dependent speakers with a mean of Ang Alamat ng Saging 222
94.5, compared to the accuracy rate from the independent
speakers with a mean of 90.8. This is anticipated since the Haring Ibon 135
dependent speakers were not common to the training data. Kabataan Ng Lahing Kayumanggi 101
Ang Pilipina Sa Bagong Milenyo 94
4. PHONETICALLY BALANCED WORDS Kapalaran 68
In lieu with the results of the conducted test of recognition Total 9768
vowels from the Filipino Language, a phonetically balanced
word list could be used with the same methodology in testing These words are screened for unique words for each article.
vowel recognition of Filipino words using continuous density The extracted list of unique words was manually transcribed
HMM for larger vocabulary word recognition. phonetically based on the UP Diksyonaryong Filipino, a
In the construction of large-vocabulary word recognition, a set monolingual dictionary maintained by the University of the
of recording must be obtained from a spoken corpus gained Philippines Center for Languages [13]. Phonemes such as /p:/
from a written corpora or a phonetically balanced word list. A /b:/ /m:/ /t:/ /d:/ /n:/ /s:/ /l:/ /k:/ /g:/ were included to denote a
Filipino Speech corpus was developed by Guevara, et. al [2] longer duration of phoneme pronunciations as compared to its
that includes both an open-ended and close-ended spoken shorter counterparts. The diphthongs /iw/ /ay/ /aw/ /oy/ /ey/
words. This methodology in training data is not phonetically /uy/ were also included as part of the vowel phoneme list.
balanced. A phonetically balanced speech text used for
English, German, Swedish, Danish, Hebrew, Italian, Finish, The 2938 unique words are tabulated into an excel spreadsheet
French, and Portuguese often taken into the following criteria to where in the frequency of words, phonetic structure,
[9][10]: Syllable structure, equal phonetic structure, phonetic syllabications, and number of syllables were included from the
balance, equal average difficulty and equal range of difficulty, 9768 word included in the articles.
common words, and speaker intelligibility.
An Ilocano Phonetically Balanced Word based on the Ilocano Table 6: Frequency of Syllable Counts from the extracted
language – the third largest dialect spoken in the Philippines unique words
[11] list was produced in 2003 from the following criteria [12]:
Syllable Count Frequency
Phonetic balance within separate list (50 word list), Syllable
structure, and Commonness of words/familiarity 1-syllable 101

This 2-syllable Ilocano word list was gathered from a locally 2-syllable 780
published articles, and 2-syllable words from an Ilocano 3-syllable 912
dictionary. 4-syllable 740
4.1. Development of PBW 5-syllable 299
The Filipino phonetically balanced words were evaluated from 6-syllable 72
16 articles found from a Filipino based textbook written for 7-syllable 23
senior public school students, “Bagwis”. This textbook is
8-syllable 7
approved by the Department of Education and Sports
Commission of the Philippine Government, thus be considered 9-syllable 2
reliable with a minimal chance of error. All the articles 10-syllable 1
extracted from the textbook are written in Filipino, which
13-syllable 1
consists of a total of 9768 words
Total 2938
Table 5: List of Articles and its corresponding word
count from the “Bagwis IV
A tabulation of the frequency of number of syllabications were
Title Words extracted from the excel spreadsheet as a basis of selecting the
Ang Gilingang Bato 2905 best syllabically homogeneous words from the 2938 unique
words.
Sa Yapak ng Pambansang bayani sa Heidelberg 1154
Table 7: 2-syllable words and 3-syllable words and its
Ang Wika Ng Pilipino At Ang Banta Ng 1150 frequency of occurrences.
Globalisasyon
Ang Kulturang Pilipino Ng Mga Wikang 1006 Syllable Count 1 occurrence >1 occurrences Total
Filipino 2-syllable 457 323 780
Walang Panginoon 607 3-syllable 663 249 912
Sandaling Repleksyon 550
Ang Sex Education Ni Inay Ukol Sa Origin Ng 548 From the 780 and 912 2-syllable and 3-syllable words found in
Mga Bata the list, A total of 323 2-syllable and 249 3-syllable words were
Ang Alibughang Anak 436 extracted of which have more than 1 frequency of occurrence
in the list to ensure commonality of words. These words are
Ang Mangmang at ang Pari 317 grouped according to their phonetic structure, constituting at
Ang Mga Kagila-gilalas Na Pakikipagsapalaran 248 least 80% of the total numbers of the 3-syllable and 2-syllable
Ni Juan Dela Cruz words.
*****
v-cvc-cvc 1 3 4
Table 8: Phonetic Structure of 2-Syllable unique word cvc-v-cv 1 3 3
list cvc-vc-cvc 1 3 5
Phonetic Frequency Vowel Consonants cvc-cv-v 1 3 3
Structure
cv-cvc cv-cv-ccv 1 3 4
130 260 390
cv-cv ccvc-cv-cv 1 3 5
61 122 122
v-cvc ccv-cvc-cv 1 3 5
43 86 86
cvc-cv ccv-cv-cv 1 3 4
27 54 81
cvc-cvc Total 251 753 950
25 50 100
cv-vc 13 26 26
214 of the 3-syllable words are represented with the following
v-vc 12 24 24 phonetic structures: cv-cv-cvc, cv-cv-cv, cv-cvc-cvc, cvc-cv-cv,
vc-cvc 3 6 9 cvc-cv-cvc, v-cv-cvc, cv-cv-vc, cv-cvc-cv, and v-cv-cv with the
ccv-cv frequencies of 0.416, 0.1526, 0.0602, 0.0441, 0.0562, 0.0441,
3 6 9
0.0361, 0.0321, and 0.0321 respectively. While 261 of the 2-
vc-cv 2 4 4 syllable words are represented with the following phonetic
cv-vc 1 1 2 structures: cv-cvc, cv-cv, v-cvc, and cvc-cv with the frequencies
of 0.4024, 0.1889, 0.1331, and 0.0836 respectively.
v-v 1 2 0
cvc-ccv 1 2 4 A frequency of each phoneme is calculated with the formula:
cvc-ccvc 1 2 5 F = (pfw * wf)
cv-v 1 2 1 n
Where:
Total 323 646 849 F frequency of phonemes represented in the list
pfw frequency of phonemes in a word
wf frequency of word occurrence
Table 9: Phonetic Structure of 3-Syllable unique word n total number of phonemes
list
This value is compared to the acceptance value with the
Phonetic
Frequency Vowel Consonant formula:
Structure
cv-cv-cvc 100 300 400 aV = 1 .
cv-cv-cv 38 114 114 ( x * n)
Where:
cv-cvc-cvc 15 45 75 aV acceptance value
cvc-cv-cvc x average of the vowels/consonants in a phonetic
14 42 70 structure
v-cv-cvc 11 33 33 n total number of words
cvc-cv-cv 11 33 44 Table 10: Acceptance values of phonemes for the 2-
cv-cv-vc 9 27 27 syllable and 3-syllable words
v-cv-cv 8 24 16
Vowel Consonants
cv-cvc-cv 8 24 32
2-syllable words 0.0023 0.0018
cv-v-cvc 6 18 18
3-syllable words 0.0015 0.0012
cv-cv-v 3 9 6
cv-vc-cvc 3 9 12 The acceptance value for the frequency of vowels from the 2-
cv-v-cv 3 9 6 syllable word list is 0.0023 and 0.0018 for the consonants
based on the 261 2-syllable words while the acceptance value
cvc-cvc-cv 3 9 15 for the vowels from the 3-syllable word list is 0.0015 and
cvc-cvc-cvc 2 6 12 0.0012 for consonants based on 214 words in list. These values
cvc-cv-vc are compared from the frequency of each phoneme in the list to
2 6 8 validate if the phoneme is well represented. Phonemes lower
cvc-v-cvc 2 6 8 than the acceptance values would not be represented, thus the
vc-ccv-cvc 1 3 5 words including the phonemes will be removed from the
accumulated list while frequencies higher or equal to the
vc-cvc-cvc 1 3 5 acceptance values well represented by the list.
vc-cv-cvc 1 3 4
vc-cv-ccvc 1 3 5
v-v-cvc 1 3 2
*****
Table 11: Frequency of Phonemes represented by the N 0.0742 ACCEPT AW 0.0022 ACCEPT
developed 2-syllable phonetically balanced word list
ŋ 0.0298 ACCEPT OY 0.0010 REJECT
Remark Remarks P 0.0273 ACCEPT EY 0.0000 REJECT
Consonants Vowel
s R 0.0194 ACCEPT UY 0.0000 REJECT
B 0.0196 ACCEPT A 0.2460 ACCEPT S 0.0210 ACCEPT
K 0.0311 ACCEPT E 0.0049 ACCEPT T 0.0556 ACCEPT 0.4439
Total
D 0.0184 ACCEPT I 0.1261 ACCEPT W 0.0261 ACCEPT
G 0.0175 ACCEPT O 0.0347 ACCEPT
Y 0.0110 ACCEPT
H 0.0247 ACCEPT U 0.0279 ACCEPT
P: 0.0037 ACCEPT
L 0.0308 ACCEPT IW 0.0000 REJECT
B: 0.0082 ACCEPT
M 0.0454 ACCEPT AY 0.0119 ACCEPT
M: 0.0082 ACCEPT
N 0.0642 ACCEPT AW 0.0041 ACCEPT
T: 0.0037 ACCEPT
ŋ 0.0779 ACCEPT OY 0.0028 ACCEPT
P EY D: 0.0029 ACCEPT
0.0122 ACCEPT 0.0000 REJECT
R 0.0128 ACCEPT UY 0.0000 REJECT N: 0.0073 ACCEPT
S 0.0348 ACCEPT S: 0.0020 ACCEPT
T 0.0357 ACCEPT Total 0.4584 L: 0.0118 ACCEPT
W 0.0135 ACCEPT K: 0.0043 ACCEPT
Y 0.0367 ACCEPT G: 0.0035 ACCEPT
P: 0.0089 ACCEPT
B: 0.0115 ACCEPT Total 0.5561
M: 0.0002 REJECT
T: 0.0049 ACCEPT The frequency gathered from each calculated phoneme is
compared with the acceptance value of 0.0023 for vowels, and
D: 0.0075 ACCEPT 0.0018 for consonants in the 2-syllable word list, and 0.0015
N: 0.0167 ACCEPT for vowels and 0.0012 for consonants in the 3-syllable word
S: 0.0021 ACCEPT list. With this criterion, the phonemes would be well
L: 0.0096 ACCEPT represented by the concluded 3-syllable and 2-syllable Filipino
Phonetically Balanced Word list.
K: 0.0041 ACCEPT
G: 0.0009 REJECT Based on the frequency gathered, phonemes /m:/, /g:/, /iw/,
/ey/, /uy/ were not well represented from the 2-syllable word
Total list; while phonemes /iw/, /ey/, /oy/, and /uy/ from the 3-
0.5416
syllable word list.
The table 11 shows the distribution of the 31 phonemes (23
consonants, and 8 vowels) of which the developed list 4.2. Results of Filipino PBW
represents. From the 261 tentative 2-syllable word list, a final From the 20 basic phonemes of the Filipino language, 16
257 2-syllable word list resulted by removing the words in the phonemes were added (10 long consonants, and 6 diphthongs).
list that are not represented by the list. Table 12 shows the 5 phonemes were not represented in the 2-syllable word list
distribution of the 32 phonemes (25 consonants, and 7 vowels) (/m:/, /g:/, /iw/, /ey/, /uy/) since the frequencies of these
of which the developed list represents. From the 214 tentative phonemes are less than the acceptable value of 0.0023 for
3-syllable word list, a final 212 3-syllable word list resulted by vowels, and 0.0018 for consonants. 4 phonemes were not
removing the words in the list that are not represented by the represented in the 3-syllable word list (/m:/, /g:/, /iw/, /ey/,
list. /uy/) which are less than the acceptable values of 0.0015 for
vowels and 0.0012 for consonants. The total number of
Table 12: Frequency of Phonemes represented by the phonemes represented by the 3-syllable word list is 214, and
developed 3-syllable phonetically balanced word list 261 for the 2-syllable word list. Since the words with the
unrepresented phonemes were removed, four (4) words from
Consonants Remarks Vowel Remarks the 2-syllable word list, and two (2) words from the 3-syllable
B 0.0137 ACCEPT A 0.2591 ACCEPT words were removed. Thus, the final list for the phonetically
balanced word list for the Filipino Language will be 257 for 2-
K 0.0316 ACCEPT E 0.0035 ACCEPT syllable words, and 212 for 3-syllable words.
D 0.0210 ACCEPT I 0.1044 ACCEPT
G 0.0440 ACCEPT O 0.0226 ACCEPT
H 0.0183 ACCEPT U 0.0473 ACCEPT
L 0.0575 ACCEPT IW 0.0000 REJECT
M 0.0501 ACCEPT AY 0.0039 ACCEPT
*****
6. REFERENCES
Table 13: Word count for the 3-syllable and 2-
1. Navaro, R. D., Recognition of Tagalog Alphabets Using
syllable list with the number of represented phonemes
The Hidden Markov Model, (2007).
2. Guevara, R., Co, M., Espina, E., Gracia, I., Tan, E.,
List 1 List 2 Ensomo, R., and Sagum, R., Development of a Filipino
3-syllable 2-syllable speech corpus, (2002).
word list word list 3. Rabiner, L. R. and Juang, B. H., Fundamentals of Speech
Phonemes represented 32 31 Recognition, Englewood Cliffs, NJ, Prentice Hall, (1993).
Number of words per list 212 257 4. Kumar, K., Hindi Speech Recognition System using HTK,
International Journal of Computing and Business Research
(Vol 2, Issue 2), (2011).
5. CONCLUSION 5. Sakti S., Isotani, R., Kawai H., and Nakamura, S., The Use
of Indonesian Speech Corpora for Developing a Filipino
Filipino vowel phonemes were tested using continuous density
Continuous Speech Recognition System, (2010).
HMM and MFCC for feature extraction, having a result of 94.5
6. http://wika.pbworks.com/w/page/8021671/Kasaysayan,
accuracy rate for dependent speakers and 90.8 accuracy rate for
“Ebolusyong ng Alpabetong Filipino”, (Retrieved 2012).
independent speakers. The results obtained from the
7. Lewis, P. M., Languages of Philippines. Ethnologue:
independent speakers were less than the results obtained from Languages of the World (16th ed.), (2009).
the dependent speakers since these were not common to the 8. Santiago, A., Tiangco, N., Makabagong Balarilang
training data.
Pilipino, (1985).
The development of Filipino Phonetically Balanced Words 9. Ferrer O., Speech audiometry: a discrimination test for
produced 257 words for the 2-syllable Filipino PBW List, and Spanish language, (1960).
212 words for the 3-syllable Filipino PBW List. These words 10. Rosas y De Mendizabal B., Speech Audiometry in English,
are to be used for the future development of a large vocabulary Portuguese and Spanish, (1958).
speech recognizer of Filipino words using the Continuous 11. Philippine Census, 2000. Table 11. Household Population
Density Speech Recognition. Further development of the by Ethnicity, Sex and Region: 2000, (2000).
phonetically balanced word list would be applied after 12. Sagon, R., The development of a phonetically balanced
including 3 additional textbooks from the Bagwis series used in word recognition test in the Ilocano language, (2006).
the 1st to 3rd year of Public High Schools. 13. Travero, K., UP Diksyonaryong Filipino, 1st Edition, UP
Publishing Press, Quezon City, (2001).

Das könnte Ihnen auch gefallen