Sie sind auf Seite 1von 158

AUTOMATED TAJWEED CHECKING RULES ENGINE FOR

QURANIC VERSE RECITATION

NOOR JAMALIAH BINTI IBRAHIM

FACULTY OF COMPUTER SCIENCE AND INFORMATION


TECHNOLOGY
UNIVERSITY OF MALAYA
KUALA LUMPUR
APRIL 2010
AUTOMATED TAJWEED CHECKING RULES ENGINE FOR
QURANIC VERSE RECITATION

NOOR JAMALIAH BINTI IBRAHIM

DISSERTATION SUBMITTED IN FULFILLMENT OF THE


REQUIREMENTS FOR THE DEGREE OF MASTER OF
COMPUTER SCIENCE

FACULTY OF COMPUTER SCIENCE AND INFORMATION


TECHNOLOGY
UNIVERSITY OF MALAYA
KUALA LUMPUR
APRIL 2010
ORIGINAL LITERARY WORK DECLARATION

Name of Candidate: NOOR JAMALIAH BINTI IBRAHIM

I.C/Passport No: 84083 1 -1 1 -5602

Registration/Metric No: WGA 070122

Name of Degree: MASTER OF COMPUTER SCIENCE

Title of Project Paper/Research Report/Dissertation/Thesis (o.this work"):


AUTOMATED TA.IWEED CHECKING RULES ENGINE FOR QURANIC VERSE
RECITATION

Field of Study: SPEECH RECOCNITION

I do solemnly and sincerely declare that:


1) I am the sole author/writer of this Work;
2) This Work is original;
3) Any use of any work in which copyright exists was done by way of fair dealing and for
permitted purposes and any excerpt or extract from, or reference to or reproduction of
any copyright work has been disclosed expressly and sufficiently and the title of the
Work and its authorship have been acknowledged in this Work;
4) I do not have any actual knowledge nor do I ought reasonably to know that the making
of this work constitutes an infringement of any copyright work;
5) I hereby assign all and every rights in the copyright to this Work to the University of
Malaya ("UM"), who henceforth shall be owner of the copyright in this Work and that
any reproduction or use in any form or by any means whatsoever is prohibited without
the written consent of UM having been first had and obtained;
6) I am fully aware that if in the course of making this Work I have infringed any copyright
whether intentionally or otherwise, I may be subject to legal action o. u.ty other action as
may be determined by UM.

Candidat nature Date */+ f *ro


Subscribed and solemnly declared before,

z^r0r RAZAI(
Date Ztty(u r"
Name: IECTUIER
SYSTEU & COMPUTER TECHNILOGY DEP^RTMENT

^ FACULTY qF SCIENCE^NO
Ueslgnatlon
TECHNOLOGY
INFORMATION
UNIVERSIIY oF MALAYA
50603 KUALA LUMPUR
ABSTRACT

Automated speech recognition for Quranic verse recitation with Tajweed checking

rules capabilities is a new research area. It is because, the current method of Quranic

learning process through manual method of Al-Quran reading skills, become less effective

and unattractive to be implemented, especially towards the young Muslim generation. This

method also known as talaqqi and musyafahah method, which described as face to face

learning process between students (Recitors) and teachers (Mudarris), where the process of

listening, correction and repetition of the correct Al-Quran recitation took place in real time

condition. Automated speech recognition system with tajweed checking rules capability

could be another alternative due to support the existing method of manual skills of Quranic

learning process, without denying the main role of Mudarris in teaching Al-Quran. This

system is not intended to replace the Al-Quran, nor will replace the role of teachers, but to

complement the teaching process and to ensure that the art of reciting Al-Quran is not lost

and forgotten. In this thesis, an automated Tajweed checking rules engine for Quranic verse

recitation was develop and tested, due to present the easiest way to Muslim to recite and

learn Al-Quran, with better understanding of Tajweed. Feature extraction technique of Mel-

frequency Cepstral Coefficients (MFCC), will be used to extract features and

characteristics from Quranic verse recitation, as well as Hidden Markov Model (HMM) for

training and recognition purposes. Most challenging task in this research is to implement

Al-Quran with speech recognition system, altogether with the engine capability in checking

the tajweed rules. However, this engine able to achieve recognition rate that exceeded

91.95% (ayates) and 86.41% (phonemes), which indicates that the development of this

engine was successfully implemented.

ii
ABSTRAK

Sistem Pengecaman Suara Automatik dengan aplikasi penilaian hukum-hukum

Tajwid, khusus bagi pembacaan ayat-ayat suci Al-Quran telahpun dibangunkan dan ianya

merupakan satu bidang yang masih lagi dianggap baru. Penyelidikan ini dilaksanakan,

ekoran timbulnya masalah dalam system pembelajaran dan pengajaran Al-Quran sedia ada

serta kaedah yang digunapakai sekarang ini, iaitu secara manual melibatkan para pelajar

dan guru-guru (Mudarris) itu sendiri. Kaedah ini dipercayai kurang berkesan serta kurang

daya tarikan untuk dilaksanakan, terutamanya terhadap generasi muda Islam. Pendekatan

kaedah pembelajaran ini diadaptasi daripada salah satu bentuk pembelajaran Al-Quran

secara Talaqqi dan Musyafahah, iaitu dikenali sebagai pembelajaran secara bersemuka di

antara para pelajar dan guru-guru (Mudarris). Melalui kaedah ini, segala proses

pembelajaran Al-Quran iaitu mendengar, membetulkan bacaan Al-Quran dan mengulang

kembali pembacaan dengan lancar dan bertajwid berlaku. Sistem automatik dengan

kemampuan seta keupayaan untuk menilai hukum-hukum Tajwid pada bacaan Al-Quran

merupakan salah satu bentuk alternatif lain bagi menyokong kaedah pembelajaran Al-

Quran yang sedia ada iaitu secara manual, tanpa mengabaikan atau mempertikaikan

peranan utama Mudarris dalam pengajaran Al-Quran. Sistem yang dibangunkan ini tidak

bermaksud untuk menggantikan Al-Quran, malah tidak bermaksud untuk menggantikan

peranan guru, tetapi fungsinya lebih cenderung untuk melengkapi proses pembelajaran

sedia ada ketika ini dan memastikan bahawa seni bacaan Al-Quran itu sendiri tidak hilang

dimamah usia serta tidak mudah dilupakan begitu sahaja. Dalam tesis ini, enjin

pengecaman bagi hukum-hukum Tajwid khusus bagi ayat-ayat Al-Quran telahpun

dibangunkan serta diuji kemampuannya, menerusi pengenalan kepada satu kaedah terbaru

iii
yang paling mudah untuk digunapakai oleh masyarakat Islam melalui pemahaman yang

lebih baik dalam mempelajari Al-Quran. Teknik pengekstrakan fitur menggunakan Mel-

frequency Cepstral Coefficient (MFCC) telahpun digunapakai dalam kajian ini, dimana

fitur dan ciri-ciri yang terdapat pada bacaan ayat-ayat suci Al-Quran diekstrak, manakala

klasifikasi Hidden Markov Model (HMM) pula digunakan bagi tujuan latihan dan

pengecaman. Tugasan yang paling mencabar dalam melaksanakan penyelidikan ini adalah

ketika proses implementasi ayat-ayat Al-Quran pada sistem pengecaman suara, ditambah

dengan keupayaannya untuk memeriksa hukum-hukum Tajwid. Namun begitu, enjin yang

telah dibangunkan ini berupaya mencapai kadar pengecaman yang tinggi melebihi 91.95%

(ayat) dan 86.41% (perkataan), di mana ia menunjukkan bahawa enjin yang telah

dibangunkan ini berjaya dilaksanakan.

iv
ACKNOWLEDGEMENTS

   


In the name of Allah, the Most Gracious, the Most Merciful.

All praise is due to Allah, the Creator and Sustainer of this whole universe, the

Most Beneficent and the Most Merciful, for His guidance and blessing and granting me

knowledge, patience and perseverance to accomplish this research successfully.

Firstly, I would like to acknowledge University of Malaya, especially the

Department of Computer System & Technology for providing me support to carry out this

research. I take great pride to forward my sincere appreciation and deepest gratitude to my

supervisor Mr. Zaidi Razak, for his valuable guidance, support, encouragement and effort

throughout this research project. Without his tireless efforts, patience and guidance, this

research could not have been successfully completed. My special thank also dedicated to

my project leader, Prof. Dato’ Dr. Mohd Yakub @ Zulkifli Bin Haji Mohd Yusoff, for his

valuable guidance and moral support throughout this tough years.

I would like to take this opportunity to wish thank you to University of Malaya in

funding in University of Malaya Scholarship Scheme (SBUM). I am much honored to be

the recipient of this scholarship, which support my financial life and funded my studies,

thus enabled me to concentrate on my research project.

v
Last but not least, most profound gratitude and respect to my family, especially my

beloved parents, Haji Ibrahim Bin Husain and Hajjah Maimunah Muda, who have been the

ultimate source of my motivation to work hard and inspiration of my life. Therefore, I

proudly dedicate this work to both of them, may Allah SWT bless both of them.

April 2010

Noor Jamaliah Binti Ibrahim

Department of Computer System & Technology,

Faculty of Computer Science & Information Technology,

University of Malaya,

Kuala Lumpur.

vi
TABLE OF CONTENTS

Page

ABSTRACT ii

ABSTRAK iii

ACKNOWLEDGEMENTS v

TABLE OF CONTENTS vii

LIST OF FIGURES xii

LIST OF TABLES xv

LIST OF ABBREVIATIONS xvi

CHAPTER 1: INTRODUCTION

1.1 Introduction 1

1.2 Background 2

1.3 Motivation 3

1.4 Problem Statements 4

1.5 Research Objectives 4

1.6 Scope of Research 5

1.7 Research Methodology 5

1.8 Terminology 6

1.8.1 Utterances 7

1.8.2 Vocabularies 7

1.8.3 Accuracy 7

vii
1.9 Thesis Outline 8

CHAPTER 2: LITERATURE REVIEW

2.1 Introduction 11

2.2 The “Art of Tajweed” 12

2.3 Effect of the “Art of Tajweed” on the acoustic model 13

2.4 Linguistic properties of Arabic 14

2.5 Quranic Verse Recitation Recognition Systems 21

2.5.1 Pre-processing 23

2.5.1.1 Endpoint Detection 23

2.5.1.2 Pre-emphasis filtering/Noise filtering/ 23

Smoothing

2.5.1.3 Channel Normalization/ Distortion 24

Equalization

2.5.2 Feature Extraction 25

2.5.2.1 Linear Predictive Coding (LPC) 25

2.5.2.2 Perceptual Linear Prediction (PLP) 26

2.5.2.3 Mel-Frequency Cepstral Coefficient 26

(MFCC)

2.5.2.4 Spectrographic Analysis 29

2.5.3 Training/Feature Classification and Pattern Recognition 29

Techniques

2.5.3.1 Hidden Markov Model (HMM) 30

viii
(a) HMM Training 31

(b) HMM Testing 31

2.5.3.2 Artificial Neural Network (ANN) 32

2.5.3.3 Vector Quantization (VQ) 33

2.5.4 Recognition/Identification 34

2.5.4.1 Hidden Markov Model (HMM) 34

2.5.4.2 Vector Quantization (VQ) 35

2.5.4.3 Artificial Neural Network (ANN) 36

2.6 Comparison of Speech Recognition techniques for 36

Quranic Arabic recitation

2.7 Summary 38

CHAPTER 3: RESEARCH METHODOLOGY

3.1 Introduction 39

3.2 Tajweed checking rules engine techniques and algorithms 40

3.2.1 Speech Samples Collection (Speech Recording) 42

3.2.2 Mel-Frequency Cepstral Coefficients Feature Extraction 43

3.2.2.1 Preemphasis 46

3.2.2.2 Framing 48

3.2.2.3 Windowing 49

3.2.2.4 Discrete Fourier Transform (DFT) 52

3.2.2.5 Mel Filterbank 53

3.2.2.6 Discrete Cosine Transform (DCT) 54

ix
3.2.3 Hidden Markov Model Classification 56

3.2.3.1 Hidden Markov Model Training 59

(a) Initialization 60

(b) Probability Evaluation 64

(c) Re-Estimation 69

(d) Result – Model of HMM 72

3.2.3.2 Hidden Markov Model Testing/Recognition 72

(a) Initialization 75

(b) Probability Evaluation 76

(c) HMM Recognition Result 78

3.3 Summary 79

CHAPTER 4: DESIGN AND IMPLEMENTATION

4.1 Introduction 81

4.2 Overview of Automated Tajweed Checking Rules Engine 82

4.2.1 Engine Development Part 83

4.2.2 Content Development Part 83

4.3 Tajweed checking rules engine architecture 84

4.4 Data Flow Diagram for Tajweed Checking Rules Engine 87

4.5 Tajweed Checking Rules Engine Flow Chart 88

4.6 Tajweed Checking Rules Graphical User Interfaces 92

4.7 Summary 99

x
CHAPTER 5: EXPERIMENTAL RESULTS AND DISCUSSION

5.1 Introduction 100

5.2 Speech Samples Collection (Recording Process) 100

5.3 Result of Feature Extraction 103

5.4 Result of Features Training 103

5.4.1 Tajweed Checking Rules Database 106

5.5 Result of Features Matching/Testing 107

5.5.1 Testing – Word (ayates) Like Template 110

5.5.2 Testing – Phonemes Like Template 113

5.6 Summary 121

CHAPTER 6: CONCLUSION & FUTURE ENHANCEMENT

6.1 Introduction 122

6.2 Significance and Contributions of Tajweed Checking 122

Rules engine for Quranic verse Recitation

6.3 Observations on Weaknesses and Strengths 123

6.3.1 Strengths 123

6.3.2 Weaknesses 125

6.4 Future Research 126

6.5 Conclusion 127

REFERENCES 128
APPENDIX A 134
APPENDIX B List of Published Papers and Achievements 139

xi
LIST OF FIGURES

Page

Figure 2.1: Arabic general Characteristics 18

Figure 2.2: System architecture 22

Figure 2.3: Block diagram of the computation steps of MFCC 28

Figure 2.4: Interconnected group of nodes in ANN 32

Figure 2.5: The Encoding-Decoding Operation in VQ 34

Figure 3.1: MATLAB code for recording process 42

Figure 3.2: Block diagram of the computation steps of MFCC 44

Figure 3.3: Time and Spectrum graph for the recitation 46

“Bismillahi Al-Rahmani Al-Rahim”

Figure 3.4: MATLAB code for the Preemphasis stage of MFCC 47

Figure 3.5: MATLAB code for framing stage of MFCC 48

Figure 3.6: Framing Signal (Frame size = 256 samples) 49

Figure 3.7: MATLAB code for the windowing stage of MFCC 50

Figure 3.8: Hamming Window 51

Figure 3.9: Windowed speech segment 51

Figure 3.10: FFT computation of MATLAB code 52

Figure 3.11: MFCC Cepstral Coefficients computation of MATLAB code 53

Figure 3.12: Result of MFCC Cepstral Coefficients 54

Figure 3.13: The MFCC Cepstral Coefficients for ayates ‘Maaliki yawmid 55

diini’

Figure 3.14: Automated Tajweed Checking Rules system structure 58

xii
Figure 3.15: The HMM sequence of training block diagram 60

Figure 3.16: The state transition probability matrix (A) for ayates 61

‘Maaliki yawmid diini’

Figure 3.17: MATLAB code for initialize the model (mu, sigma) 62

Figure 3.18: M-File function of hmm_mint 62

Figure 3.19: The mean vectors mu (µ), for ayates ‘Maaliki yawmid diini’ 63

Figure 3.20: The covariance matrices sigma (Σ) for ayates 64

‘Maaliki yawmid diini’

Figure 3.21: MATLAB code for Forward-Backward Recursions 65

Figure 3.22: MATLAB code for the re-estimation of transition parameters 70

Figure 3.23(a): MAT-file trained model of A_ values (1-14) 72

Figure 3.23(b): MAT-file trained model of mu_ (μ) values (1-13) 72

Figure 3.23(c): MAT-file trained model of sigma_ (Σ) values (1-13) 72

Figure 3.24: The HMM sequence of testing/recognition block diagram 74

Figure 3.25: MATLAB code for ‘realmin’ 75

Figure 3.26 (a): Output score for the ayates ‘Maaliki yawmiddiini’ 78

Figure 3.26 (b): Log-Likelihood Ratio (LLR) for the ayates 79

‘Maaliki yawmiddiini’

Figure 4.1: Automated Tajweed Checking Rules for Quranic verse 81

recitation context diagram

Figure 4.2: Overview of Automated Tajweed Checking Rules Engine 82

Figure 4.3: Block diagram schematic illustrating Tajweed checking rules 85

engine

Figure 4.4: Tajweed checking rules engine architecture 86

xiii
Figure 4.5: Tajweed Checking Rules Engine Data Flow Diagram (DFD) 88

Figure 4.6: Automated Tajweed checking rules engine for Quranic flow chart 89

Figure 4.7: Automated Tajweed Checking Rules Engine for Quranic verse 92

Recitation Graphical User Interface

Figure 4.8: Load the wave file of input speech sample from sourate Al-Fatihah93

Figure 4.9: Analyzing process of sourate Al-Fatihah using MFCC (Started) 94

Figure 4.10: Analyzing process of sourate Al-Fatihah using MFCC (Finished) 94

Figure 4.11: The input speech sample and spectrogram graph for ‘Bismillah’ 95

utterance

Figure 4.12: The incorrect recitation of ‘Bismillah’ utterance 96

(1st mistake/notification)

Figure 4.13: The incorrect recitation part involved and Tajweed rules 96

Figure 4.14: The incorrect recitation of ‘Bismillah’ utterance 97

(2nd mistake/notification)

Figure 4.15: The incorrect recitation part involved and Tajweed rules 97

Figure 4.16: The correct recitation of ‘Bismillah’ utterance 98

Figure 4.17: The notification of correct recitation of ‘Arrahmaanirrahiim’ 98

utterance

Figure 4.18: The correct recitation of ‘Arrahmaanirrahiim’ utterance 99

Figure 5.1: Percentage of accuracy for recognition rate (Ayates & Phonemes) 119

Figure 5.2: Percentage of Word Error Rate (WER) for ayates & Phonemes 120

xiv
LIST OF TABLES

Page

Table 2.1: The Arabic alphabets 15

Table 2.1: The Arabic alphabets (continued) 16

Table 2.2: Arabic diacritics 17

Table 2.3: Arabic Consonants 19

Table 2.4: Approaches used by Quranic Arabic recitation using speech 37

recognition techniques

Table 3.1: MFCC Parameter Definition 45

Table 3.2: MFCC Filter Equations 45

Table 5.1: Except from the dictionary of Sourate Al-Fatihah 101

Table 5.2: Summary of the Total Collected Speech Samples for each Ayates 102

Table 5.3: Template Data of HMM Model for Collected Quranic Recitations 104

Table 5.4: The Tajweed Pronunciations rules in Sourate Al-Fatihah 106

Table 5.5: Result of Likelihood Ratio (LLR) for 8 recitations of speech 111

samples (1.0 x 103)

Table 5.6: Test result for 8 recitations of speech samples 112

(ayates of Sourate Al-Fatihah)

Table 5.7: Comparison between correct and incorrect Tajweed rules 114

for ayates “Bismillahir <rahmaanir> rahimi”

Table 5.8: Comparison between correct and incorrect Tajweed rules 115

for ayates “Bismillahir rahmaanir <rahiimi>”

Table 5.9: Test result for 28 recitations of speech samples (Phonemes) 118

xv
LIST OF ABBREVIATIONS

ANN : Artificial Neural Network

ASR : Automatic Speech Recognizer

CN : Channel Normalization

DCT : Discrete Cosine Transform

DFT : Discrete Fourier Transform

FFT : Fast Fourier Transform

FIR : Finite Impulse Response

FS : Sampling Frequency

GUI : Graphical User Interface

HMM : Hidden Markov Model

Hz : Hertz

ICT : Information & Communication Technology

IDFT : Inverse Discrete Fourier Transform

IV : In Vocabulary

J-QAF : Jawi, Quran, Arabic and Fardhu Ain (Islamic obligatory duty)

LBG : Linde, Buzo and Gray

LLR : Log Likelihood Ratio

LPC : Linear Predictive Coding

MFCC : Mel-frequency Cepstral Coefficients

MSA : Modern Standard Arabic

NN : Neural Network

xvi
OOV : Out of Vocabulary

PLP : Perceptual Linear Prediction

PC : Personal Computer

VQ : Vector Quantization

WER : Word Error Rate

xvii
CHAPTER 1

INTRODUCTION

1.1 Introduction

In this technological era, technologies such as information technology have making

a great impact to our daily life. Furthermore, the problem of communication between

human being and information technology had become so critical nowadays. Until now, this

communication had been completely done by using keyboard and screens, but there are

some weaknesses and limitation to be implemented to other applications. Speech is

considered as the most widely used and natural means of communication between human,

and it is an obvious substitute for such means of keyboard and screens in the

communications process. Moreover, the process of exchanging the ideas among the human

were carried out with the aid of communication and has facilitated the development of

technology into the various form. Although speech applications in the computer machines

interface area has been growing drastically, but the processing forms capabilities for

generating and interpreting speech is still incomplete and not perfect. Investigations in this

research field have led into the development of automatic speech recognition systems.

As we know, speech recognition is highly demanded and has many useful

applications. This research simulates speech recognition technology which incorporates

with the various components in Artificial Intelligence; natural languages processing, speech

1
recognition technology and human computer interaction fundamentals. Here, this research

is concerned with speech recognition technology, which is part of speech and signal

processing technology.

1.2 Background

In learning Al-Quran as shown by our Prophets, different systems and

methodologies are essential in putting the word of God in its rightful place. The

development of Quranic lesson learns have been successfully produced a lot of Quranic

scholars and at the same time promoting the Quranic standard into high priority level. The

development of the ICT also has change the world into many ways, both positive and

negative aspects. Therefore, each of Muslim must be able to identify the appropriate and

practical ways of selecting the right type of information obtained from this new technology.

Even though the world has changed drastically, the development in Quranic studies have

never become outdated. World globalization era as well as high technology, also could not

prevent the academia in Quranic studies from been influenced by the current trends of

technology.

Focus on this research, it will stress only on Quranic recitation of speech

processing, which related to ‘Tajweed Rules’ based on recitation recognition process. It is

believed that, this recognition system invented was capable to educate the students and

adults by using the interactive learning system with Tajweed checking rules (Al-Quran

reading rules) correcting capability. Moreover, the existing product/technology available

are only capable to show Al-Quran texts and/or play stored Al-Quran recitation, while this

2
system offers students to recite Al-Quran through the system and the recitation will be

revised and corrected in real-time.

It is believed that, Al-Quran learning process required the special and effective way

to recite Al-Quran (Tabbal et al., 2006). Furthermore, Al-Quran learning process is still

handled with manual method, based on Al-Quran reading skills through talaqqi and

musyafahah methods. These methods are described as face to face learning process

between students and teachers (Mudarris), where listening, correction of Al-Quran

recitation and recite again the correct Al-Quran recitation took place (Berita Harian, 2005).

This method is so important to be implemented, so that the Muslim will know how the

hijaiyah letters are correctly pronounced. The process only can be done, if the Mudarris and

also the recitors follow the art, rules and regulations while reading Al-Quran, known as

“Rules of Tajweed” (Tabbal et al., 2006).

1.3 Motivation

The motivation(s) of this research project are:

(i) In learning Al-Quran, recitors required to learn it through the manual

method of Al-Quran reading skills of talaqqi & musyafahah method.

Through this method, Mudarris required to check the tajweed rules of their

students individually. This bring a lot of problems to Muddaris to control or

handle the students prior with a large amount of students per classes. The

3
targeted objectives in j-QAF were going to be difficult to achieve, due to

complete the syllabus specified (Tasmik & Khatam Al-Quran module). It is

because of constraint of time schedule provided.

(ii) Shortage of ICT applications in teaching and learning process may bring a

bad outcome to the quality performance of students. Student easily become

bored and do not interested to participate in class.

(iii) Current busy lifestyle needs a modern and technological approach for self-

learning method to recite Al-Quran, which can improve the learning process

of Quranic and also optimize the study time.

1.4 Problem Statements

The problem statement(s) of this research project are:

(i) Non-automated of Tajweed checking rules existed and invented, as a

learning tool which independently capable to evaluate the user reading and

performances.

1.5 Research Objectives

The objective(s) of this research project are:

(i) To define the most suitable algorithm for feature extraction and recognition,

due to be implemented in Tajweed checking rules engine.

4
(ii) To determine the most accurate recognition process that suite the Quranic

verse recitation.

(iii) To develop an engine that combines feature extraction and recognition, due

to develop a new automated of Tajweed checking rules system.

1.6 Scope of Research

Tajweed checking rules engine only check the basic rules of Tajweed and “Mad” in

Quranic recitation, such as:

(i) Basic rules (Idgham– Bilal;Ma’al;Syamsi, Izhar–

Halqi;Syafawi;Qamari, Iqlab & Ikhfaq Haqiqi)

(ii) Mad Asli, Mad Arid Lissukun

This project is totally 100% software based system and did not involve with any hardware

implementation. Thus, only MATLAB coding, simulation and GUI modeling involved in

this research.

1.7 Research Methodology

This automated Tajweed checking rules engine for Quranic verse recitation is

typically designed, mainly to guide and assist the user, specifically Muslim user during

reading Al-Quran. The aim of this system is to facilitate the recitors during Al-Quran

learning process focused on Quranic recitation based on ‘Rules of Tajweed’. Meaning that,

5
the system created capable to check the tajweed rules based on stored database and

recognize the particular sourate in Al-Quran, which may recite by recitors either correct or

not, based on the Tajweed rules guidelines. This research is carried out in different stages

as described below:

(i) Collect the input speech samples from different recitors.

(ii) Extract the features from the collected Quranic recitation of speech samples

and produce a set feature vectors.

(iii) Train the features vectors against the initial/available database, in order to

build the unique database/model.

(iv) Recognize/Match as well as testing the unknown features vector against the

trained database in order to obtain the accuracy of recognition.

(v) Evaluate the performance of the Quranic recitation recognition engine.

1.8 Terminology

The following definitions are the basics needed for understanding speech

recognition technology. Besides, these definitions also would probably can be acts as

constraints or difficulties, which encountered by a speech recognition system that are

related to the Quranic Arabic languages.

6
1.8.1 Utterances

An utterance is the vocalization of a word which represents a single meaning

to the computer. Utterance can be a single word, a few words, a sentence or even

the multiple sentences, as long as it has a single meaning to the computer (Oxford

English Dictionary (11th Edition)). Here, the variability of the Quranic Arabic

language can be caused by the dialectical differences. The variability in dialect

between Arabic countries and even dialectical difference in the same country causes

the word to be pronounced in a different way.

1.8.2 Vocabularies

Vocabularies are also known as dictionary, is the list of words of utterances

computer (Oxford English Dictionary (11th Edition)), that can be recognizes by

speech recognition system. In fact, small dictionary are easier for computer to

recognizes, while the large dictionary were more difficult. Moreover, the Arabic

language is morphologically rich, thus causing a high vocabulary growth rate. This

high growth rate is problematic for language models by causing a large number of

out-of-vocabulary words.

1.8.3 Accuracy

The efficiency and the ability of the system recognizer or speaker can be

determined or examined by measuring its accuracy level of responsive against the

speech recognizers. It includes not only correct identifying of utterances, but also

7
identify either the spoken utterance is in its vocabulary or not. The acceptable

accuracy of a system is really depending on the application.

1.9 Thesis Outline

This thesis contains of 6 chapters, including the introduction chapter. Each chapter

is subject to certain scopes, which formulate the thesis contents. Below are the chapter

numbers, titles and summaries of each documented chapter in this thesis.

Chapter 1: Introduction

Chapter 1 present the definition and background of the project, including the

problem statements, research objectives, scope of research, research

methodology, terminology and thesis outline, which outline the scope and

coverage of the project.

Chapter 2: Literature Review

Chapter 2 highlight the key of related researches, algorithms and techniques

that related and relevant with this research, in terms of commonly used of

features extraction, classification and pattern matching techniques used, and

provides an overview on current research related to speech recognition

system.

8
Chapter 3: Research Methodology

Chapter 3 provides a brief description and explanation about the research

methodology used in this research. The sub-topics for this chapter include

the research procedures of the main techniques adopted, including the

algorithms used in this research.

Chapter 4: Design and Implementation

Chapter 4 provides an architecture design for Automated Tajweed Checking

rules engine for Quranic verse recitation. The sub-topics for this chapter

include the research design of this engine and its implementation, as well as

many other diagrams that represent the logical and physical designs of the

systems.

Chapter 5: Experimental Result and Discussion

Chapter 5 contains experimental data and result, as well as other extra

information, analysis and discussion of the result obtained, after the training

and testing procedures executed on the system/Evaluate the performance of

the overall system.

Chapter 6: Conclusion and Future Enhancement

Chapter 6 summarizes the work accomplished and discussed the possibilities

and the recommendations in the future.

9
Appendix A:

Appendix A contains all signals of ayates in Sourate Al-Fatihah, obtained

from the MATLAB simulation.

Appendix B:

Appendix B contains a list of achievements and participation for both

International and National conferences, as well as competition and

exhibitions.

10
CHAPTER 2

LITERATURE REVIEW

2.1 Introduction

Speech recognition is one of the most important areas in digital signal processing.

In speech recognition, the scope of research area also involved with ‘artificial intelligent’ of

the system or machines itself, that may able to ‘hear’ and ‘understand’ the spoken

information from the particular recitation. Recently, automatic speech recognition has

reached a very high standard and performance for the past 5 years. Moreover, speech

recognition is highly demanded technology, which consists of many useful applications.

The main area that believes can contribute towards the effectiveness of this research project

focus on speech recognition is the part of pattern recognition technology. However, speech

recognition has some problem, which belongs to a much broader scientific topic called

pattern recognition or pattern matching. According to Huang et al. (2001), spoken language

processing relies on pattern recognition, which is one of the most challenging problems for

machines.

In this chapter, the general concepts that are related to Quranic Arabic accent were

reviewed and the significances that motivate in conducting this research were presented in

the subsequent chapters of this thesis. First, the art of Tajweed in Al-Quran were discussed

and presented. This provide a short and brief description about the ‘art’, which totally

11
different in term of language in Arabic, that is recognizably unique towards a set of

pronunciation rules, known as Rules of Tajweed. Secondly, a brief overview to prosody is

discussed. Experimental studies from the literature are presented, which shows the gap and

differences between written and recite Al-Quran. Next, the brief discussion about the effect

of the “Art of Tajweed” on the acoustic model, which can influence the recitation

recognition aspect in checking the tajweed rules. Those effects, formerly related with the

Arabic linguistic properties, which then will be discussed elaborately in the next part of 2.4.

Finally, some key related to the research were highlights, as well as algorithms and

techniques that are relevant to this research. Various types of feature extractions,

classifications and matching techniques were also discussed in this chapter.

2.2 The “Art of Tajweed”

“Tajweed” is an Arabic word meaning proper pronunciation during recitation, as

well as recitation at a moderate speed. It is a set of rules which govern on how Al-Quran

should be read (Bashir, M.S. et al., 2003). It is considered as art because not all recitors will

perform the same recitation of Quranic verse in the same way (Tabbal, H. et al., 2006).The

“art of tajweed” defines with some of flexible well-defines rules to recite Al-Quran. Those

rules create a big difference between normal Arabic speeches and recited of the Quranic

verses, which may produce the interesting result based on the impact of “art” analysis for

the automatic speech recognition process, especially on the acoustic model. Furthermore,

the “Art of Tajweed” is the manual methods that need a lot of work and proved to be

12
unable to adapt to new recitors. However, it is still believe that, the special way to recite

Al-Quran is by looking forward the art of tajweed (Bashir, M.S. et al., 2003).

2.3 Effect of the “Art of Tajweed” on the acoustic model

As we already know, each person’s voice is different. Thus, Al-Quran sound which

had been recited by most of recitors will probably tends to differ a lot from one person to

another. Although those Quranic sentence were particularly taken from the same verse, but

the way of the sentence in Al-Quran been recited or delivered may be different (Tabbal, H.

et al., 2006). It may produce the difference sounds for the different recitors. Moreover,

there are many difficulties arise when dealing with the specialties of the Arabic language in

Al-Quran, regarding to the differences between written and recite Al-Quran. Those same

combinations of letters may be pronounced differently due to the use of harakattes (Tabbal,

H. et al., 2006). The most important tajweed rules that believed can influence the recitation

recognition aspect were stated as below:

i) Necessary prolongation of 6 vowels.

ii) Obligatory prolongation of 4 or 5 vowels.

iii) Permissible prolongation of 2, 4 or 6 vowels.

iv) Normal prolongation of 2 vowels.

v) Nasalization (ghunnah) of 2 vowels.

vi) Silent unannounced letters.

vii) Emphatic pronunciation of the letter R.

13
The above laws are based from the specific recitation rules. Moreover, the

predefined “maqams” also been used by recitors to vary the tone of their recitations

(Tabbal, H. et al., 2006). There are 10 different laws set according to the 10 certified

scholars, such as Hafs, Kaloun, Warsh, Shu’bah, Hicham, Ibn-Dhakwan, Al-Duri, Al-Susi,

Al-Bazzi and Kunbul, who teaches the recitation of the Holy Quran (Habash, M., 1998). In

order to deal with these laws, the prolongations as the repetition of the vowel n-

corresponding times need to be considered, same as well as the nasalization (Tabbal, H. et

al., 2006). This rule governs the consonants/vowel combinations, usage of short and long

vowels, the co-articulation effect of emphatics and pharyngeals, pronunciations, Tanween

and Ghonna rules, as well as rules for combining words (Ahmed, M.E., 1991). Note that, if

there any echoing sound produced during the Quranic recitation recording process, the echo

will be considered as noise. That noise can be eliminated using the noise-canceling filter

(Tabbal, H. et al., 2006).

2.4 Linguistic properties of Arabic

Arabic is an official language in more than 22 countries. Since it is also the

language of religious instruction in Islam, many more speakers have at least a passive

knowledge of the language. Arabic is one of the languages that are often described as

morphologically complex and the problem of language modeling for Arabic are multipart

by the variation of dialectal (Vergyri, D. & Kirchhoff, K., 2004; Maamouri, M. et al., 2006;

Kirchhoff, K., et al., 2004). However, only Modern Standard Arabic (MSA) is used for

written and formal communication. It is because only MSA has a universally agreed upon

14
the writing standard as well as for communication purposes (Vergyri, D. & Kirchhoff, K.,

2004; Maamouri, M. et al., 2006; Kirchhoff, K. et al., 2004; Kirchhoff, K., 2002).

As mentioned earlier in part 2.3, there are many difficulties begin when dealing

with the specialties of the Arabic language in Al-Quran, due to the differences between

written and recite Al-Quran (Tabbal, H. et al., 2006; Maamouri, M. et al., 2006; Kirchhoff,

K., et al., 2004). The Quranic Arabic alphabets consist of 28 letters, known as hijaiyah

letters (from alif (‫…)ا‬until ya (‫( ))ي‬Vergyri, D. & Kirchhoff, K., 2004; Kirchhoff, K., et al.,

2004). Those letters includes 25 letters, which represent consonants and 3 letters for vowels

(/i: /, /a: /, /u :/) and the corresponding semivowels (/y/ and /w/), if applicable. A letter can

have two to four different shapes: Isolated, beginning of a (sub) word, middle of a (sub)

word and end of a (sub) word (Kirchhoff, K. et al., 2004). Letters are mostly connected and

there is no capitalization. The letter is represented as below at table 2.1, in their various

forms.

Table 2.1: The Arabic alphabets (from Ramzi, A.H. & Omar, E.A., 2007)

15
Table 2.1: The Arabic alphabets (Continued)

16
Furthermore, other phonemes of pronunciation are marked by diacritics, such as

consonant doubling (phonemic in Arabic). It is indicated by the “shadda” sign and the

“tanween”, word final adverbial markers which add /n/ to the pronunciation (Maamouri, M.

et al., 2006; Kirchhoff, K., 2004), as shown below in table 2.2. Those signs can reflect the

differences of pronunciation. Moreover, the diacritics are really important in setting up the

grammatical functions, which leading to the acceptable text understanding and correct

reading or analysis (Maamouri, M. et al., 2006). The entire set of diacritics is listed in table

2.2 below:

Table 2.2: Arabic diacritics (from Vergyri, D. & Kirchhoff, K., 2004)

17
Some Arabic letters may have an additional character called Hamza. Another non-basic

character is Taa-Marbuwta which is always at the end of the word. The Arabic language

has a very large vocabulary. Arabic characters may have diacritics which are written as

strokes above or below the character, which can change the pronunciation and meaning of

the word. However, they are usually omitted in handwriting.

Figure 2.1: Arabic general characteristics

According to the figure 2.1 shown above, each number represents certain characteristic as

listed below:

1. Writing direction 6. Ligatures

2. Ascenders 7. Connected Components (sub-word).

3. Descenders 8. Turning points.

4. Holes (loops). 9. Different letters forms with regards to

5. Secondary Parts (dots/diacritics). their position within the words (Sari, T. et

al., 2002)

According to table 2.3, the Arabic language is characterized by a relatively large

number of back consonants. This type of consonants can cause a complex co-articulation

phenomenon in Arabic speech. Besides, a set of allophone as well as the consonants letters

18
(Ahmed, M.E., 1991; Youssef, A. & Emam, O., 2004) also described, which had been

divided into several groups classified as below:

Table 2.3: Arabic Consonants (from Ahmed, M.E., 1991)

Group A: The Emphatic Consonants: /T/, /S/, /D/, and /∂/.

19
Group B: The Pharyngeals: /q/, /x/, and /γ/; and /r/.

Group C: The Madd letters: Alif, “‫“ أ‬, Ya’a, “‫”ی‬, Waw, “‫”ﯣ‬.

Group D: The rest of the letters (except the pharyngealized Lam /L/).

Group E: Glottal/Pharyngeals (Al-Ezhar letters); /E/, /h/, /H/, /? /, /x/, /γ/,

Group F: Ash-Shamsi letters: /t/, /Ө/, /d/, /∂/, /z/, /s/, /∫/, /S/, /D/, /T/, /∂/, /l/, /n/.

Group G: Al-Qamari letters: /E/, /b/, /dz/, /H/, /x/, /? /, /γ/, /f/, /q/, /k/, /m/, /w/, /h/

Group H: Muqalqal letters (aspirated): /q/, /T/, /b/, /dz/, /d/.

Group I: Ikhfa’a letters: /t/, /Ө/, /s/, /∫/, /dz/, /d/, /∂/, /z/, /S/, /D/, /T/, /∂/, /f/, /k/, /q/.

Group J: Voiceless Fricative consonants: /f/, /Ө/, /s/, /∫/, /h/, /H/, /S/, /x/.

Group K: Stops: /D/, /d/, /t/, /T/, /k/, /q/, /b/.

Group L: The Consonants: /dz/, /q/, /k/.

Letter to sound conversion for Arabic usually has a simple one to one mapping

between orthography and phonetic transcription for given correct diacritics. 14 vowels had

been used to accommodate for short and long vowels, same as well as the emphatic vowels.

Each syllable begins with a consonant followed by a vowel, which are limited and easily

detectable. Short vowels are denoted by “V” and long vowels are denoted as “V:” (Ahmed,

M.E., 1991; Youssef, A. & Emam, O., 2004; Essa, O., 1998). Those syllable can be

classified according to the length of the syllable, which also known as harakattes (Tabbal,

H. et al., 2006).

CV Short ; open

CV: Long ; open

CVC Long ; closed

20
CV: C Long ; closed

CVCC Long ; closed

CV: C Long ; closed

2.5 Quranic Verse Recitation Recognition Systems

According to the research, the project is mainly focus on the basic of speech

recognition technology, but it will implemented into the different type of application or

languages such as Arabic in Al-Quran. Quranic Arabic recitation is best described as long,

slow pace rhythmic, monotone utterances (Kirchhoff, K. et al., 2003). The sound of

Quranic recitation recognizably unique and reproducible according to a set of

pronunciation rules, tajweed, designed for clear and accurate presentation of the text.

Tabbal, H. et al. (2006) already go through the implementation of Quranic verse

recitation recognition, which covered Al-Quran verse delimitation system in audio files

using speech recognition techniques. Here, the Quranic recitation and pronunciation as well

as software used for recognition purposes had been discussed. The Automatic Speech

Recognizer (ASR) has been developed by using the open source Sphinx framework as the

basis of this research. The scope of this project more focus towards the automated

delimiter, which can extract the verse from the audio files. Research techniques for each

phase were discussed and evaluated using the implementation of various techniques for

different recitors, which recite sourate “Al-Ikhlas”. Here, the most important Tajweed rules

and Tarteel, which may influence the recognition of a specific recitation, can be specified.

21
A comprehensive evaluation of the Quranic verse recitation recognition techniques

was provided by Ahmad, A.M. et al. (2004). The survey provides recognition rates and

descriptions of test data for the approaches considered. The Quranic Arabic recitation

recognition that is incorporates with the background on the area, discussion of the

techniques and potential research directions. Here, Recurrent Neural Network with Back

propagation based on time approaches in speech recognition had been implemented.

Differences of each Arabic's letters from alif (‫ )ا‬until ya (‫ )ي‬have been observed based on

performance of cepstral analysis and recognition effectiveness. In general, there are five

major stages in a speech recognition system. Under the same techniques of speech

recognition, the Quranic Arabic recitation recognition also can be implemented based on

these techniques specified:

1. Pre-Processing 3. Training /Feature classification

2. Feature Extraction 4. Recognition /Identification

It can be described based on the system architecture shown below:

Figure 2.2: System architecture


22
2.5.1 Pre-Processing

In order to improve the readability and the automatic recognition of speech

processing, pre-processing steps are essential. The main benefit in pre-processing of speech

recognition is to organize the information and simplify the following task of recognition.

Those pre-processing steps are mainly consists by the following:

1. Endpoint Detection.

2. Pre-Emphasis Filtering/Noise Filtering/Smoothing.

3. Channel Normalization/ Distortion Equalization.

2.5.1.1 Endpoint Detection

Short-time energy or spectral energy is usually used as the primary features

parameter with the augmentation of zero-crossing rate, pitch and duration information in

endpoint detection algorithms. However, recently the endpoint detection features become

less reliable in the presence of non-stationary noise and various type of sound artifact

(Shen, J. et al., 1998). It is because the endpoint detection and verification of speech

segments become relatively difficult in noisy environment.

2.5.1.2 Pre-Emphasis Filtering/Noise Filtering/Smoothing

The purpose of the smoothing stage is to decrease the noise and regularize the word

contours. Ahmad, A.M. et al. (2004) are also digitized the Arabic’s alphabets from speaker,

as well as digital filtering. The digital filtering may emphasize the important frequency

23
component in signal. Then the start-end point can be analyzed based on the signal of the

phonemes. Here, GoldWave audio editor software has been used to filter the input speech

signal from analog to digital signals, due to analyze the start-end points that contain

information of speech.

According to Tabbal, H. et al. (2006), the use of 2-stage pre-emphasis filter with the

different factor value (0.92 and 0.97) could increase the recognition ratio of some audio

files. It is due to the speech frame of 10ms and a threshold of 10dB for the speech extractor

chosen. It also can consider as the noise canceling filter, due to eliminate echo (noise).

Besides the pre-emphasis filtering, there is another technique used by Kirchhoff, K. et al.

(2004), which is Kneser-Ney smoothing. Kneser-Ney smoothing able to built trigram

models for each of the stream with different morphology. It believes can outperform other

smoothing method consistently including in noisy environment.

2.5.1.3 Channel Normalization/ Distortion Equalization

Another approach used for pre-processing method was also known as Channel

Normalization. According to J. de Veth & Boves, L. (1998), Channel Normalization (CN)

techniques has been developed with different applications domains, where a particular

recognizer is trained with speech recorded using the microphone. The recognition is

attempted based on speech recorded with the different microphone. Here, the contribution

of the channel normalization during the training is still unknown in details, but, it is still

constant although during the test time.

24
2.5.2 Feature Extraction

Feature extraction is the process of extracting measurements from the input to

differentiate among classes. The main objective of feature extraction is to extract

characteristics from the speech signal that are unique, discriminative, robust and

computationally efficient to each word, which then used to differentiate between different

words (Ursin, M.,2002). According to Martens, J.P. (2002), there is various speech of

features extraction techniques, stated as below:

1. Linear Predictive Coding (LPC)

2. Perceptual Linear Prediction (PLP)

3. Mel-Frequency Cepstral Coefficient (MFCC)

4. Spectrographic Analysis

2.5.2.1 Linear Predictive Coding (LPC)

Ahmad, A.M. et al. (2004) used this type of extraction technique to extract the LPC

coefficients from the speech token. Then, the coefficients are converted to cepstral

coefficient that served as input to the neural networks. The drawback of LPC may estimate

the high sensitivity to quantization noise. By converting the LPC coefficients back into

cepstral coefficient, it can decrease the sensitivity of high and low order cepstral coefficient

to noise.

25
According to the Ahmed, M.E. (1991), LPC model had been replaced with a

formant that has much wider frequency spectrum. It is believed that, the LPC synthetic

model can give a bad outcome for the research, due to deduce the prosodic rules. This rule

is very important rules of missing blocks, in order to construct an allophone based Arabic

text-to-speech by rules.

2.5.2.2 Perceptual Linear Prediction (PLP)

Another popular feature set is Perceptual Linear Prediction (PLP) coefficients,

which had been used by Vuuren, S.V. (1996) in his research. In the research, Vuuren, S.V.

compared the discriminability and robustness against noise for both Perceptual Linear

Prediction (PLP) and Linear Predictive Coding (LPC). Particularly for PLP, the spectral

scale is the non-linear Bark scale and the spectral features are smoothed within the

frequency bands.

PLP is first introduced by Hermansky, H. (1990), who formulated PLP feature

extraction as a method for deriving a more auditory-like spectrum, based on linear

predictive analysis by making some engineering approximations on the psychophysical

attributes of the human hearing process.

2.5.2.3 Mel-Frequency Cepstral Coefficient (MFCC)

The purpose of this research is to convert the speech waveform into the form with

some type of parametric representation. Thus, the viability of Mel-Frequency Cepstral

Coefficient (MFCC) technique to extract features from the Quranic verse recitation can be

explored and investigated. MFCC is perhaps the best popular features extraction method

26
used recently (Bateman, D. et al., 1992; Ehab, M. et al., 2007), and MFCC also one of the

most popular feature extraction techniques used in speech recognition, whereby it is based

on the frequency domain of Mel scale for human ear scale (Chetouani, M. et al., 2002).

MFCC’s are based on the known variation of the human ear’s critical bandwidths with

frequency. Speech signal had been expressed in the Mel frequency scale, in order to capture

the important characteristics of phonetic in speech. This scale has a linear frequency

spacing below 1000 Hz and a logarithmic spacing above 1000 Hz. Normal speech

waveform may vary from time to time depending on the physical condition of speakers’

vocal cord. Rather than the speech waveforms themselves, MFFCs are less susceptible to

the said variations (Rabiner, L. & Juang, B.H., 1993).

Based on the research conducted by Ahmad, A.M. et al. (2004), Mel scale has been

used to perform filterbank processing to the power spectrum. It has been performed after

the windowing and FFT process had been implemented. Similar approaches also had been

carried out by Tabbal, H. et al. (2006). The use of the MFCC has proven the remarkable

result in the field of speech recognition. It is because, the behavior of the auditory system

had been tried to emulate by transforming the frequency from a linear scale to a non-linear

one.

According to the Youssef, A. & Emam, O. (2004), 12 dimensional Mel Frequency

Cepstral Coefficients (MFCCs) is been coded for recorded speech data. Pitch marks were

produced using Wavelet transform approach by using the glottal closure signal. This signal

is obtained from the professional speaker during the recording process. Khalifa, O. et al.

27
(2004), had identified the main steps for MFCCs, that clearly shown in figure 2.3 below.

The main steps include the followings:

Figure 2.3: Block diagram of the computation steps of MFCC

According to figure 2.3, MFCCs consist of the following steps:

1. Preprocessing 5. Mel-Filterbank

2. Framing 6. Logarithm

3. Windowing 7. Inverse DFT

4. DFT

Same as well as Hasan, M.R. et al. (2004), MFCCs has been used for feature extraction for

security system based on speaker identification. Here, the pitch tone of the speech signal is

measured on the ‘Mel’ scale. The Mel-frequency scale formula is based on mathematical

equation shown below:

Mel ( f )  2595 * log 10(1  f / 700) (1)

28
2.5.2.4 Spectrographic Analysis

There are a few Arabic speech recognition systems, which are normally speaker

dependent and use the different techniques such as formants values and their trends. Here,

automated spectrogram provides the better result by using Spectrographic Analysis

compare than simple formants values. The objective research of Bashir, M.S. et al. (2003)

is to implement one of the feature extraction strategy for Arabic language phoneme

identification through spectrographic analysis. According to the research, the different

spectrograms are represented by particular distinct bands within the spectrogram, which

can be identified from each phoneme of Arabic language. Determination of the particular

phoneme depends on certain frequency band specified. Based on the result, speech

processing using spectrograms gives more accurate results compared to other conventional

techniques. However, spectrogram analysis is believed takes more times to execute and

difficult to automate, especially in speech processing.

2.5.3 Training / Feature Classification and Pattern Recognition Techniques

According to Huang, X. et al. (2001), spoken language processing relies heavily on

pattern recognition, which is one of the most challenging problems for machines. The main

objective of pattern recognition is to classify the object of interest into one of a number of

categories or classes. The object of interest is known as patterns, but in this case the classes

are referring to the individual words. Since the classification procedure applied on this

research is implemented on extracted features, so it also can refer as feature matching. The

pattern matching for recognition purposes is divided into 3 types, which are:

29
1. Hidden Markov Model (HMM)

2. Artificial Neural Network (ANN)

3. Vector Quantization (VQ)

2.5.3.1 Hidden Markov Model (HMM)

Nathan, K. et al. (1995) had implemented the HMM’s for recognizing handwritten

words captured from a tablet. It is because Hidden Markov Model (HMM’s) had been

successfully applied for speech recognition system. Moreover, the output of the front-end

was then used to feed the sphinx core recognizer that used the Hidden Markov Model

(HMM) as the recognition tool. This recognition method had been implemented by Tabbal,

H. et al. (2006) in their research. The results of the recognizer have been implemented in a

hash map to translate into the common Arabic words. HMM has generates a discrete time

random process consisting of two sequences of random variables, which are hidden and the

known observations. The underlying structure of HMM is set of states that associated with

the probabilities of transitions between states, known as Markov Chain (Hansen, J.C.,

2003).

In the other hand, the acoustic decisions trees used in synthesis are built from the

HMM alignment. The HMM alignment is done by Youssef, A. & Emam, O. (2004), where

acoustic, energy, pitch and duration trees have been developed and executed with the

efficient maximum-likelihood algorithms existed for HMM training and recognition (Lee,

K.F. & Hon, H.W.,1989).

30
(a) HMM Training

HMM has introduced Baum-Welch or Forward-Backward algorithm for

training HMMs. All the algorithms of HMM play a crucial role in ASR (Automated

Speech Recognizer). It involved with states, transitions and observations map into

the speech recognition tasks. The extensions to the Baum-Welch algorithms needed

to deal with spoken language. This method had been implemented by Jurafsky, D.

& Martin, J.H. (2007) in their research. Here, speech recognition systems trained

each phone of HMM that embedded in an entire sentence. Thus, the segmentation

and phone alignment are done automatically, as parts of the training procedure.

Vocabulary of words to be recognized is modeled by a distinct HMM, whereas each

word in the vocabulary has a training set of k utterances by different speakers

(Rabiner, L. & Juang, B.H., 1993). Those HMM model parameters (A, B, π) need to

be estimated, which represented the likelihood values of the training set.

(b) HMM Testing

HMM has introduced the Viterbi algorithm for decoding the HMMs. Any

unknown words to be recognized, as well as measurements of the observation

sequence via feature analysis of the speech regardless to the word are made. The

word is selected using the Viterbi algorithm, whose model likelihood is maximum

(Hemantha, G.K. et al., 2006).

31
2.5.3.2 Artificial Neural Network (ANN)

Artificial Neural Network (ANN) often called as Neural Network (NN). It is a

computational model or mathematical model based on biological neural networks. ANN are

made up from the artificial neurons interconnecting and it may either used to gain an

understanding of biological neural networks or for solving artificial intelligence problems

without necessarily creating a model of a real biological system. Artificial Neural Network

(ANN) belongs to the Artificial intelligence approaches, which attempt to mechanize the

recognition procedure. The procedure is depend to the way a person applies intelligence in

visualizing, analyzing and characterized the speech based on a set of measured acoustic

features (Madisetti, V.K. & William, D.B., 1999).

According to Huang, X. et al. (2001), dealing with non-stationary signals need us to

address on how to map an input sequences to an output sequences properly. It could be

happen when 2 sequences are not synchronous, where the proper alignment, segmentation

and classification should be included. Thus, the basic neural networks are not well

equipped to address these problems as compared to HMM’s.

Figure 2.4: Interconnected group of nodes in ANN (from Huang, X. et al., 2001)

32
2.5.3.3 Vector Quantization (VQ)

Quantization is the process of approximating continuous amplitude signals by

discrete symbols. It can be quantizes on a single signal value or parameter known as scalar

quantization, vector quantization or others. Related to this topic, Huang, X. et al. (2001)

had described the vector quantizer as the codebook, which is a set of fixed prototype

vectors or reproduction vectors. Each prototype vectors also known as a codeword. In order

to perform the quantization process, each of input vector need to be matched against each

codeword in the codebook, using distortion measured. Thus, the VQ process includes the

distortion measure and the generation of each codeword for particular codebook involved.

The goal of VQ is definitely on how to minimize the distortion (Vuuren, S.V., 1996).

VQ is divided into 2 parts, known as features training and features matching.

Features training are mainly concerned with randomly selecting features vector and

perform training for the codebook using Vector Quantization (VQ) algorithm. Besides, the

features training are also involved with Vector Quantization (VQ). The training process of

the VQ codebook applies an important algorithm known as the LBG VQ algorithm, which

is used for clustering a set of L training vectors into a set of M codebook vector. This

algorithm is formally implemented by the recursive procedure (Linde, Y. et al., 1980). The

following steps are required for the training of the VQ codebook using the LBG algorithm

described by Rabiner, L. & Juang, B.H. (1993).

33
Figure 2.5 shows a block diagram of a vector quantizer, which consist of two main

parts known as encoder and decoder. The task of encoder is to identify in which of N

geometrically specified regions of the input vector lays. Then, the decoder refers to the

table lookup and it is fully determined by specifying the codebook (Wai, C.C., 2003).

Figure 2.5: The Encoding-Decoding Operation in VQ (from Wai, C.C., 2003)

2.5.4 Recognition/Identification

There are many methods used for recognition as well as identification. Under the

same techniques of speech recognition, the normally methods used nowadays listed as

below:

1. Hidden Markov Model (HMM)

2. Vector Quantization (VQ)

3. Artificial Neural Network (ANN).

2.5.4.1 Hidden Markov Model (HMM)

As described at section 2.5.3 under the feature classification, HMM method had

been fully implemented for both recognition and training purposes (Lee, K.F. & Hon,

H.W., 1989). Under the same research handled by Jurafsky, D. & Martin, J.H. (2007), digit

34
recognition task for HMM recognition have been used. A lexicon specifies the phone

sequence and each phone of HMM are composed from three sub-phones with a Gaussian

Emission Likelihood Model. The observation of likelihood is computed by the acoustic

model. By combining all these elements with adding an optional silence at the end of each

word will results into a single HMM for the whole task. Note that, the transition from the

‘End state’ to the ‘Start state’ is to allow digit sequences of arbitrary length.

In the order hand, recognition also has been carried out by Viterbi, A.J. (1967),

through the research in a large HMM. For context-independent phone recognition, an initial

and a final state are created. The initial state is connected with null arcs to the initial state

of each phonetic HMM, and null arcs connect the final state of each phonetic HMM to the

final state. The final state is also connected to the initial state. Hidden Markov Model

(HMM) is widely used as statistical method of characterizing the spectral properties of

frames for certain utterance. The process can be assumed as a random process and the

parameters of this process can be estimated in precise and accurate.

2.5.4.2 Vector Quantization (VQ)

The most successful text-independent recognition method is based on VQ. In this

method, VQ codebook consists of a small number of representative features vector, which

are used as an efficient means of characterizing speaker-specific features. A speaker-

specific codebook is generated by clustering the training features vector of each speaker,

which described at part 2.5.3.3. In the recognition stage, an input utterance is vector-

quantized using the codebook of each reference speaker and the VQ distortion has

35
accumulated over the entire input utterance, which is used to make the recognition

decision. It is believed that VQ-based method is more robust than a continuous HMM

method, which had been stated by Matsui, T. & Furui, S. (1993) in their research.

2.5.4.3 Artificial Neural Network (ANN)

Artificial Neural Network (ANN), also known as Neural Network (NN). ANN also

mainly used as feature matching or recognition for speech processing. It normally used to

classify a set of features, which represent the spectral-domain content of the speech

(regions of strong energy at particular frequencies). The features then will be converted into

phonetic-based categories at each frame. Then, Viterbi search is used to match the neural-

network output scores to the target words (the words that are assumed to be in the input

speech), in order to determine the word that most likely uttered (Hosom, J.P. et al., 1999).

2.6 Comparison of Speech Recognition techniques for Quranic Arabic recitation

This section provides a comparison between researchers on speech recognition

systems using the techniques discussed in this literature. The main criterion for the

comparison of the approaches used on the Quranic Arabic based on its performances is

shown in table 2.4 below.

36
Table 2.4: Approaches used by Quranic Arabic recitation using speech recognition
techniques
Pre- Feature Classification/
References processing Extraction Recognition Performance
Method Techniques
Pre-
[Tabbal, H. emphasis MFCC Hidden Markov 85% - 92%
et al. ‘06] filter Model (HMM)
[Youssef, A.
& Emam, - MFCC Hidden Markov 90.2%
O.’04] Model (HMM)
[Ahmad, Recurrent
A.M. et Digital MFCC Neural Network MFCC 95.9%-98.6%
al.’04] Filtering LPCC (RNN) LPCC 94.5%-99.3%
Spectrographic
[Bashir, Preemphasis Analysis based
M.S. et al. filtering Spectrographic on different 93.33%
‘03] (Bandpass Analysis frequency band
filter) of intensity
[Kirchoff, Kneser-Ney Hidden Markov
K. et al.’04] smoothing Not stated Model (HMM) Not stated
[Hasan, Vector
M.R. et - MFCC Quantization 57%-100%
al.’04] (VQ)
[Podder, -
S.K.’97] LPC VQ and HMM 62%-96%
[Bhotto,
M.Z.A & Vector
Amin, - MFCC Quantization 70%-85%
M.R.’04] (VQ)

37
2.7 Summary

In this study, all different methods or approaches have been discussed, in order to

find the most suitable method to be used in this project. After a while, method or

approaches which logically can be used in this project had been decided. MFCCs method

was decided to be used in feature extraction, because it implements the DFT and FFT

algorithms. Moreover, majority of researches had used MFCCs, as their main features for

extraction purposes.

In order hand, the training as well as recognition part will be conducted either using

the HMM, ANN or VQ. Those 3 methods normally used in speech recognition purposes

recently and the most dominant pattern recognition techniques used in the field of speech

recognition. Moreover, these methods have shown great performance equally, through the

different ways and expectations. Both methods have their own benefits and weaknesses.

From the point of view, HMM is the most suitable methods used and this methods have

been implemented by most of researchers in Arabic speech recognition. However, these

methods have been implemented with speaker-dependent and not speaker-independent,

with low percentage of accuracy. It totally different with VQ algorithm that mostly used by

the researchers through their research project on Speech Recognition related with English

language. In addition, the combinations for both MFCC and HMM techniques were mostly

implemented for speech recognition application, especially for Arabic language, as shown

in table 2.4. Besides, those techniques also have been successfully proven to be applied in

this research, since the percentages of the performance were above 90%.

38
CHAPTER 3

RESEARCH METHODOLOGY

3.1 Introduction

In previous chapter, notice that the success of automated speech recognition

systems requires a combination of various techniques and algorithms, which performs a

specific task for achieving the main goal of the system. Therefore, a combination of related

algorithms is essential, due to improve the accuracy and recognition rate of such

applications. However, in this chapter it will highlight on research methodology for the

development of Automated Tajweed checking rules engine for Quranic verse recitation,

which mainly stress on the techniques and algorithms used for the development and

implementation of this engine. In fact, this chapter provides a step-by-step MATLAB

implementation of feature extraction, features classification and features matching process

used in developing this engine.

Here, the main algorithm used for feature extraction technique of Mel-Frequency

Cepstral Coefficients (MFCC) is described and implemented to all set of ayates or

phonemes of the Quranic recitation. Besides, this engine also implemented the Hidden

Markov Model (HMM) algorithm, mainly for classification/matching and pattern

recognition purposes.

39
3.2 Tajweed checking rules engine techniques and algorithms

As mentioned earlier in previous chapters, the conventional method for speech

recognition of Hidden Markov Model (HMM) has been highlighted. In this technique, the

features vector of speech is been extracted and the recognition results were depends on its

log likelihood for each of word in vocabulary. The largest value of log likelihood is decided

as recognition result. Since different people can give different pronunciation even for the

same sentence, so the HMM classification is used to improve the accuracy of recognition.

In this chapter, the techniques and algorithms involves in this research will be

discussed in details. First, the input instruction is filtered to get rid of the noise and the

features are extracted using feature extraction technique of Mel-Frequency Cepstral

Coefficients (MFCC), due to extract the important characteristics of speech signal, which

represent a set of features vector as the output result. Then, the whole sentence can be

estimated and classification can be made. Here, pattern classification method used is known

as Hidden Markov Model (HMM). The entire process in this research is shown as below:

Input : Quranic verse recitation of Sourate Al-Fatihah

Output : Result of Sourate Al-Fatihah recitation – notification for any correct or

incorrect recitation based on Tajweed rules

Stage 1: Training

Begin

Step 1 : Input speech signal of Quranic verse recitation is sampled

40
Step 2 : Preemphasis is executed – Finite Impulse Response (FIR) filter

Step 3 : The speech signal is framed

Step 4 : Framed speech signal is windowed by using Hamming Window

Step 5 : Fast Fourier Transform is applied to the windowed speech signal

Step 6 : Mel-Frequency Cepstral Coefficients (MFCC) is calculated

Step 7 : HMM model is developed, i.e: λ (A, pi0, mu, sigma) is evaluated and

stored in the database

End

Stage 2: Testing/Recognition

Begin

Step 1 : Input speech signal of Quranic verse recitation is sampled

Step 2 : Preemphasis is executed – Finite Impulse Response (FIR) filter

Step 3 : The speech signal is framed

Step 4 : Framed speech signal is windowed by using Hamming Window

Step 5 : Fast Fourier Transform is applied to the windowed speech signal

Step 6 : Mel frequency Cepstral Coeefficients (MFCC) is calculated

Step 7 : HMM model is developed, i.e: λ (A, pi0, mu, sigma) is evaluated

Step 8 : The observation sequence and HMM values, obtained from the test input

are compared with all models present in the database, through the Viterbi

algorithm

Step 9 : The recognition results of the recognized word is decided based on the

maximum value of log likelihood of the test data match with trained data

End

41
3.2.1 Speech Samples Collection (Speech Recording)

In this part, recording process will be executed, due to collect the Quranic recitation

of speech samples from different speakers. According to Rabiner and Juang (1993), there

are 4 main factors need to be considered while collecting the speech sample, such as:

1. Who are the talkers

2. The speaking condition

3. The transducers & transmission systems

4. The speech unit

These 4 factors need to be identified first, before any process of recording executed.

It is because; these factors will affect the performance and the output result, especially the

training set vectors that will be used in training and testing process. In this project, this

automated Tajweed checking rules engine has used a simple MATLAB function for

recording the speech samples. In figure 3.1 below, shows the MATLAB code used in

recording process of speech samples.

Figure 3.1: MATLAB code for recording process

42
However, this function requires the user to define the certain parameters before the

recording process were carried out. Those parameters include the sampling rate (Hz), as

well as time length in seconds. Here, the MATLAB command "wavrecord" is used to read

the audio signals from the microphone directly. The command format is:

y = wavrecord (n, fs);

Where "n" is the number of samples to be recorded and "fs" is the sampling rate. In

this recording part, duration of time length for the recording process took place is 4

seconds, recorded using the normal microphone. "Duration*fs" is the number of sample

points to be recorded. The recorded sample points are stored in the variable "y" with vector

size of 64000x1.

3.2.2 Mel-Frequency Cepstral Coefficients Feature Extraction

In the late 1970s, coefficients derived from the cepstrum began to replace the Linear

Prediction Coefficients (LPC) as the basic algorithm and parameter set for speech

recognition applications. The Mel-Frequency Cepstral Coefficients (MFCC) is frequently

used nowadays for feature extraction technique in speech processing. In this technique, the

used of Mel scale in the derivation of cepstrum coefficients was introduced. The Mel scale

is a mapping of the linear frequency scale based on human auditory perception (Levent,

M.A., 1996).

43
As mentioned earlier, the main objectives of feature extraction is to extract the

important characteristics from the speech signal, that are unique for each word, due to

differentiate between a wide set of distinct words. According to Ursin, M. (2002), MFCC is

considered as the standard method for feature extraction in speech recognition and perhaps,

the most popular feature extraction technique used nowadays. MFCC able to obtain a better

accuracy with a minor computational complexity, respect to alternative processing as

compared to other feature extraction techniques (Davis, S.B. & Mermelstein, P., 1980).

DCT

Figure 3.2: Block diagram of the computation steps of MFCC

The proposed method for feature extraction is given in figure 3.2 shown above. At

this stage, it will emphasize on MFCC computational process, as the main algorithm for

feature extraction analysis. Here, the feature extraction algorithm of MFCC has been used

and applied to all collected of speech samples to obtain the targeted output of features

vector. There are certain parameters need to define first before the MFCC algorithm and

coefficients value were estimated. In table 3.1, shows the parameter values as well as

MFCC filter equations, which is used in this entire MFCC MATLAB code.

44
Table 3.1: MFCC Parameter Definition

Parameter Value

Time Length 4 seconds

Sampling Rate 16 000 Hertz per second

Frame Size (windowSize) 256

Number of filters 40

Table 3.2: MFCC Filter Equations

Parameter Value

FFT points (NFFT) 2048

Linear filter (Nlinear) 13

Logarithmic filters (Nlog) 27

Spacing of linear filters (Slinear) 66.667 Hz

Spacing of logarithmic filter (Slog) 1.0712

Lower bound of the 1st filter (f0) 133.13 Hz

The voice input is recorded using the normal microphone and sound recorder utility

supported by latest OS of Windows XP or Vista. In automated Tajweed checking Rules

engine, the speech sample is 16 000 Hz for 4 seconds of time length, with a sampling

precision of 16 bits. In the preprocessing stages, array of speech signal were obtained from

the microphone after the recording process. The time graph and spectrum of speech signal

were calculated and displayed in both time graph, as well as spectrum graph in specific

figure of plot format. Figure 3.3 shown below is the result of the time graph and spectrum

for the Quranic recitation of “Bismillahi Al-Rahmani Al-Rahiim”.

45
Figure 3.3: Time and Spectrum graph for the recitation “Bismillahi Al-Rahmani Al-
Rahim”

3.2.2.1 Preemphasis

Preemphasis is considered as the first step of MFCC under the preprocessing stage

in speech processing, which involved the signal conversion from analog to digital signal.

The sequence of samples x[n] is obtained from the continuous time signal x(t), which stated

clearly in the relationship below:

xn  xnT  (1)

46
Where, T is the sampling period and 1/T = fs is the sampling frequency, in

samples/sec. ‘n’ is represented as the number of samples. The above equation is mainly

used to obtain a discrete time representation of a continuous time signal through periodic

sampling. The size of the sample for digital signal is determined by the sampling frequency

and the length of the speech signal in seconds. At the first stage in MFCC feature

extraction, the amount of energy is used to boost into the high frequencies. It can be seen

through the spectrum graph for speech segments like vowels, where there is more energy at

the lower frequencies compared to the higher frequencies. This drop of energy across

frequencies is caused by the nature of the glottal pulse (Jurafsky, D. & Martin, J.H., 2007).

When the frequency increases, preemphasis also increased the energy of the signal. This

preemphasis is implemented by using a filter, based on the equation below:

yn  xn  0.97 xn  1 (2)

Figure 3.4: MATLAB code for the Preemphasis stage of MFCC

Preemphasis can be executed after the digitization of a speech signal through the 1st order

of FIR (Finite Impulse Response) filter.

H  z   1  z 1 (3)

47
Where, α is the preemphasis parameter set to a value close to 1, in this case 0.97. By

applying the FIR filter to speech signal, the preemphasis signal will be related with the

above equation (2) mentioned and MATLAB code in figure 3.4.

3.2.2.2 Framing

After preemphasis filtering process executed, the filtered of input speech will be

framed. Here, the columns of data from the particular speech input will be determined. The

Fourier Transform used here, only reliable when the signal is in a stationary position. In

this case, speech or voice implementation holds within a short time interval only, less than

100 milliseconds of frame rate. Thus, the speech signal will be decomposed into a series of

short segments and each of the frames will be analyzed, then any useful features will be

extracted from it. A 256 of window size or point frame is chosen in this research, which

can be seen in figure 3.6.

Figure 3.5: MATLAB code for framing stage of MFCC

48
FRAMING SIGNAL

1
Amplitude

-1

-2

-3

-4

50 100 150 200 250


Frame Size

Figure 3.6: Framing Signal (Frame size = 256 samples)

3.2.2.3 Windowing

Windowing is one of the important parts in MFCC feature extraction process. Here,

each individual frame of speech signal is windowed, due to minimize the signal

discontinuities at the beginning and at the end of each frame. The purpose of this action is

to minimize the spectral distortion and to taper the signal to zero at the beginning and at the

end of each frame. The windowing equation is defined as below:

w(n); Where 0  n  ( N  1) (4)

N is the number of samples in each frame. The result of windowing signal of (y (n)) is

defined as:

y (n)  x(n) * w(n) , 0  n  N 1 (5)

49
The Hamming window, w(n) used in this work is given by equation (6) below:

  2n 
0.54  0.46 cos ,0  n  N  1
w(n)    N 1  (6)
 0, otherwise

Figure 3.7 shows the MATLAB code for performing the windowing of the segmented of

speech samples, whereas figure 3.8 shows the hamming window graph developed.

Figure 3.7: MATLAB code for the windowing stage of MFCC

The effect of windowing of speech sample can be visualized clearly in figure 3.9. The

transition of speech sample seems to be smooth towards the edge of the frame.

50
HAMMING WINDOW
1

0.9

0.8

0.7

0.6
Amplitude

0.5

0.4

0.3

0.2

0.1
50 100 150 200 250
Samples

Figure 3.8: Hamming Window

WINDOWED SPEECH SEGMENT


4

1
Amplitude

-1

-2

-3

-4

50 100 150 200 250


Frame Size

Figure 3.9: Windowed speech segment

51
Once the data of speech sample is framed and windowed, the data at the end of the

frame is going to be likely reduced to zero and resulted with the loss of information. Thus,

the overlapping approach is allowed to be executed between frames. It allows the adjacent

frame to include the portion of data into the current frame. Meaning that, the edges of the

current frame will be included at the center of adjacent frames. Normally, around 60% of

overlapping signal is sufficient to cover the lost information and also attempt to smoothen

the varying parameters. Fast Fourier Transforms (FFT) is applied to windowed of speech

sample, which converts each frame of N samples from the time domain into the frequency

domain.

3.2.2.4 Discrete Fourier Transform (DFT)

According to Owen, F.J. (1993), the Discrete Fourier Transform (DFT) normally

computed via Fast Fourier Transform (FFT) algorithm. This algorithm is widely used for

evaluating the frequency spectrum of the speech and converts each frame of N samples

from the time domain into the frequency domain. The FFT is defined on the set of N

samples X n  as:

N 1
X n   x k e  2kn / N , Where n  0,1,2...N  1 (7)
k 0

In this research, the windowed of speech segment is transformed into the frequency domain

by using the Fourier Transform through the MATLAB command shown in figure 3.10. It

computes the FFT that returns the result of DFT values:

Figure 3.10: FFT computation of MATLAB code

52
3.2.2.5 Mel Filterbank

Mel scale is applied due to place more emphasize on the low frequency

components. It is because, the information carried by low frequency components of the

speech signal is more important than the high frequency components. Mel scale is a unit of

special measure or scale of perceived pitch of tone. Mel Filterbank also known as Mel

Frequency Warping, where it does not correspond linearly to the normal frequency, but

behaves linearly below 1000Hz and a logarithmic spacing above 1000Hz. The following

equation shown below is the approximate empirical relationship to compute the Mel

frequencies for a given frequency f expressed in Hz:

Mel ( f )  2595  log(1  f / 700) (8)

In order to implement the filterbanks, the magnitude coefficients of each Fourier

transform of speech segment is binned by correlating them with triangular filter in the

filterbank. In other hand, Mel-scaling is performed by using a number of triangular filters

or filterbanks (Thomas, F.Q., 2002).

Figure 3.11: MFCC Cepstral Coefficients computation of MATLAB code

53
In this part, the cepstral coefficients of Mel-Frequency Cepstral Coefficients

(MFCC), which corresponding to the input were obtained. Those output results can be seen

through the MFCC Cepstral Coefficients graph shown in figure 3.12.

MFCC Cepstral Coefficients

-5

-10
Amplitude

-15

-20

-25

-30

-35
2 4 6 8 10 12
Samples

Figure 3.12: Result of MFCC Cepstral Coefficients

3.2.2.6 Discrete Cosine Transform (DCT)

DCT is a Fourier transform, which is similar to the Discrete Fourier Transform

(DFT), but using the real numbers only. DCT used to extract the Mel Frequency Cepstral

Coefficients (MFCC) results, and it is often used to calculate the cepstrum instead of

inverse FFT.

54
In this research, this part was the final step of computing the MFCCs. It required

computing the logarithm of the magnitude spectrum, in order to obtain the Mel-Frequency

Cepstral Coefficients. The MFCCs at this stage are ready to be form in a vector format

known as features vector. This features vector is then considered as an input for the next

process, which is concerned with training the features vector for recognition purposes. The

result of MFCC cepstral coefficients is shown below:

Figure 3.13: The MFCC cepstral coefficients for ayates ‘Maaliki yawmid diini’

55
3.2.3 Hidden Markov Model Classification

Hidden Markov Model (HMM) is a statistical model of system that is used in

pattern recognition field, especially in speech recognition. It is widely used for

characterizing the spectral properties of the frames for a certain pattern. Using the HMM,

the input of speech signal is well characterized as a parametric random process and the

parameters of stochastic process can be determined in precise and well-defined manner.

The parameter of HMM model need to update regularly, due to make the system able to fit

a sequence for particular application. Thus, the training of the HMM model is so important,

due to represent the utterances of words. This model is used later on in the testing of

utterances and calculating the probability of HMM model, in order to create the sequence

of vectors.

In HMM statistical approach, the Quranic recitation of input speech is represented

accordingly with some probability distributions. According to Markov models, if the

observation is a probabilistic function of state, it is called as Hidden Markov Model. It is

because, it consist of doubly embedded stochastic process with underlying, that is not

directly observable (hidden), but can be observed through another set of stochastic process

only, that may produce the sequence of observations (Rabiner, L.R. & Juang, B.H., 2003).

In this research, HMM with Multivariate Gaussian state conditional distribution has

been used in Hidden Markov Model (HMM). The HMM for discrete symbols observations

is characterized by the following elements listed below:

N : Number of states.

56
pi0 (π0) : Row vector that containing the probability distribution for the first

(Unobserved) state:  0 (i )  P( s1  i ) (9)

A : State Transition probability: aij  P( s t  1  j s t  i ) (10)

mu (μ) : Mean vectors (of the state-conditional distributions) stacked as row

vectors, such mu(i,:) is the mean (row) vector, corresponding to i-th

state of the HMM.

sigma (Σ): Covariance matrices. These values will be stored in 2 different ways

depend on whether full or diagonal covariance matrices used.

Full covariance matrices: Sigma ((1+ (i-1)*p) :( i*p), :) (11)

Diagonal covariance matrices: Sigma (i, :) (12)

Figure 3.14 shown below depict an automated Tajweed Checking rules system

structure, which illustrated a speaker recognition system for Quranic verse recitation. There

are 2 main stages in a speech recognition system, which are training and recognition

stages. Under the training stage, models (patterns) are generated from the input of speech

samples, after the feature extraction process and modeling techniques. Meanwhile, in the

recognition stage, features vector will be generated from the input speech samples with the

same extraction procedures in the training stage, mentioned earlier. After that classification

process, as well as the decision process was made and executed with some matching

techniques. Under the classification type, the recognition task can be divided either

identification or verification process.

57
Figure 3.14: Automated Tajweed Checking Rules system structure (λ = model parameter)

Moreover, the distinct HMM is used to model the vocabulary of words. Each word

in the vocabulary has a training set of k utterances by different speakers (Rabiner, L. &

Juang, B.H., 1993). Those utterances constitutes with an observation sequence of MFCC.

The isolated word of speech recognition mainly for Automated Tajweed checking rules

engine that consist of the following 3 major steps, which are:

58
(1) Training/Modeling: Each word in the vocabulary build an HMM model and

estimate the model parameters of   ( A, pi 0, mu , sigma ) , which represent the

likelihood of the training set observation vectors.

(2) Identification: Each unknown words to be recognized and measurement of the

observation sequence through the feature analysis of the speech, corresponding

to the word were made. Lastly, the word is selected using the Viterbi algorithm,

which the model likelihood is maximum as given in figure 3.14.

(3) Verification: The input features were compared with the registered pattern, and

any features that giving the highest score is identified as the selected/target

speaker (recitor) and recitation results. Then, these input features are compared

with the claimed speaker (recitor) and decision is made either to accept or reject

the claimed/results.

According to these 3 major steps listed above, the training/modeling step was executed

during HMM training, while the identification and verification steps were carried out

during HMM testing/matching.

3.2.3.1 Hidden Markov Model Training

The training of Hidden Markov Model is used to model and represent the

particular utterances of word or phoneme from the Quranic recitation. Thus, a complete

specification of HMM from 2 items for observation symbols of HMM model parameters, N

and p, as well as 3 sets of probabilities measures A, mu, sigma and initial state distribution,

59
pi0 are required. According to Hemantha, G.K. et al. (2006), the complete parameter of

HMM model is denoted by λ = (A, B, pi0), but in this research B represent by 2 sets of

probabilities measures of mu and sigma, which denoted as:

  ( A, pi 0, mu, sigma) (13)

It is done by adjusting the parameter for the model   ( A, pi 0, mu , sigma ) . The

execution of this adjustment is an estimation of the parameters for the model

  ( A, pi 0, mu, sigma ) , that maximizes P(O/  ). The values obtained from the λ model

will be stored in the database, for further processing in testing/recognition part in stage 2.

The sequence to create a HMM model of speech utterances is shown below:

Figure 3.15: The HMM sequence of training block diagram

(a) Initialization

A: The state transition probability matrix, using the Left-to-Right Model. The state

transition probability matrix, A is initialized with the equal probability for each

state and it can be made in sparse, due to save memory space (A should be upper

triangular for a Left-to-Right Model).

60
The values of A were obtained after the MATLAB simulations were successfully executed.

Those A values is initialized with equal probability for each state, denoted as below in

MATLAB command window:

Figure 3.16: The state transition probability matrix (A) for ayates ‘Maaliki yawmid diini’

pi0: Initialize the initial state of probability distribution, using the left-to-right

model. The initial state of probability distribution pi0 is initialized to be

deterministic and in state 1 at the beginning (ie. pi0 = [1 0 … 0]). This description

is based on speech recognition theory. (Rabiner, L.R., 1989)

pi0i = 1 0 0 0 0 0 0 0 0 0 0 0 0 0

61
Where, 1 ≤ i ≤ number of states. In this case, for the ayates ‘Maaliki yawmid diini’,

number of states i =14.

Initialize the Mean vectors (mu (μ)) and Covariance matrices (sigma (Σ)), for

model parameters using multiple observations for a Left-to-Right Hidden Markov

Model (HMM). These values are able to determine the dimensions of the model

(size of observation vector and number of states) and the type of covariance

matrices (either full or diagonal) from the size of input arguments.

Figure 3.17 shows the MATLAB code for initializing the model parameter of mu (µ) and

sigma (Σ), using multiple observations for Left-to-Right Hidden Markov Model (HMM).

Here, each parameter sequence of speech is chopped into N segment of equal length, where

N is the number of states.

Figure 3.17: MATLAB code for initialize the model (mu, sigma)

Figure 3.18: M-File function of hmm_mint

It is believed that most functions (with mu and sigma as their input arguments) are able to

determine the dimensions of HMM model (size of observations sequence and number of

states (N)) and type of covariance matrices (either full or diagonal) from their input

62
arguments. It can be calculated through the functions hmm_chk. Below are the model

parameter values of mu (µ) and sigma (Σ) for the ayates ‘Maaliki yawmid diini’.

Figure 3.19: The mean vectors mu (µ) for ayates ‘Maaliki yawmid diini’

63
Figure 3.20: The covariance matrices sigma (Σ) for ayates ‘Maaliki yawmid diini’

(b) Probability Evaluation

In this part, multiple iterations of the Expectation-Maximization (EM) algorithm for

a Left-to-Right Model are performed, with multiple training sequences. This process was

just a call to the lower-level functions, where the supplied values from the previous part in

3.2.3.1 (a) of A, mu and sigma were also used as initialization (A_, mu_, sigma_) values.

64
These values will be used and implemented in the next process, where the Forward-

Backward Recursions (with scaling) process will be executed. Figure 3.21 shows the

MATLAB code for Forward-Backward Recursions implementation.

Figure 3.21: MATLAB code for Forward-Backward Recursions

Based on MATLAB code shown above, alpha is the forward variable, meanwhile beta is

the backward variable with log1 variable as the log likelihood values. Notice that, at each

step the log-likelihood is computed from the forward variables using log1 term, returned by

hmm_fb (forward-backward), which is sum of logarithmic scaling factors used during the

computation of alpha and beta. Another variables of dens, contains the values of Gaussian

densities for each time index (useful variables for the transition probabilities estimation).

Below are the brief descriptions of those variables involved in this part:

(i) α (alpha): The Forward Algorithm

The probability of an observation sequence O = O1 O2 . . . OT  for

model λ = (A, pi0, mu, sigma) can be carried out, by finding for which of

the model that most likely has produced the observation sequence. Thus,

every possible sequence of states for length T can be evaluated, through the

equation below: (In this case, mu & sigma values represent by b or B)

T
P(O |  )    q  aq q bq (ot )
q1 ,q2 ,...,qT
1
t 2
t 1 t t
(14)

65
Based on equation (14), initially at time (t = 1) P is in state q1 with

probability  q1 . Symbol o1 with probability bq1 (o1 ) were generated. The

clock changes from t to t + 1 and a transition from q1 to q2 will occur with

probability aq1q 2 and the symbol o 2 will be generated with probability

bq2 (o2 ) . The process was continued in this manner until the last transition is

made (at time T). i.e., A transition from qT 1 to qT will occur with

probability aqT 1qT , and the symbol oT will be generated with probability

bqT (oT ) . The Forward Algorithm is based on the forward variables  t (i ) ,

defined by:

 t (i )  P(o1 o 2 ...ot , q t  i |  ) (15)

From equation (15),  t (i ) is the probability at time t and in state i, given by

the model, in which the partial observation sequence from the first

observation until observation number t, o1 o 2 ...o t having generated. The

Forward Algorithm can be computed at any time t, 1 ≤ t ≤ T, shown below:

1. Initialization

Set t = 1;

1 (i )   ibi (o1 ), 1 i  N

In this part, forward variables gets its start value

(Joint probability being in state 1 and observing

66
the symbol o1 . Only  1 (1) will have a nonzero

value in Left-to-Right Model.

2. Induction

N
 t 1 ( j )  b j (ot 1 )  t (i )aij , 1 j  N
i 1

3. Update time

Set t = t + 1;

Return to step 2 if t ≤ T;

Otherwise, terminate the algorithm (go to step 4).

4. Termination

N
P(O |  )    T (i )
i 1

α (Alpha): Alpha scaled

Due to the precision range complexity, while calculate with multiplication

of probabilities, makes α (alpha) and β (beta) scaling is necessary. Scaling of

the forward variables is performed at each time index t = 2: T, where each

row of the alpha matrix sums to 1, except the first one as shown below for

the ayates of ‘Maaliki yawmid diini’.

Alpha scale = 11
02
03
.
.
.
0T

67
Where, 1 ≤ T ≤ number of input. In this case, for the ayates ‘Maaliki yawmid

diini’, number of input arguments, T = 46.

(ii) β (Beta): Backward Algorithm

If the recursion process is described as to calculate the forward variable in

reverse way, then the  t (i ) will be the backward variable. This variable is

described with the following equation:

 t (i )  P(ot 1 ot  2 ...oT | qt  i,  ) (16)

From the equation (16),  t (i ) is the probability at time t and state i given by

the model, in which the partial observation sequence from t + 1 observation

until observation number T, ot 1ot  2 ...oT having generated. The variable can

be calculated inductively according to below:

1. Initialization

Set t = T – 1;

T (i )  1, 1 i  N

2. Induction

N
t (i )    t 1 (i )aij b j (ot 1 ), 1 i  N
j 1

3. Update time

Set t = t - 1;

Return to step 2 if t ≥ 0;

68
Otherwise, terminate the algorithm.

β (Beta): Beta scaled

The backward variables are scaled using the same normalization factors as

indicated in forward scale factors. The reason is to ensure the re-estimation

of the transition matrix is correct.

(iii) Log (P(O|λ)) : Probability of the observation sequence

The probability of the observation sequence (Log (P(O|λ)) is saved in a

matrix to see the adjustment of the re-estimation sequence. In this case, Log

(P(O|λ)) represent by log1 in MATLAB program. Note that, summation of

the sum (log (scale)) of total probability has been used for every iteration.

The current value of log1 is compared with the previous log1 in previous

iteration, where if the different value (measure value) is less than threshold

value, then maximum value can be obtained.

(c) Re-Estimation

The recommended algorithm used for re-estimation parameters for the model, λ =

(A, pi0, mu, sigma), is by using the Iterative Baum-Welch Algorithm. This algorithm

responsible to maximize the likelihood function for the model λ = (A, pi0, mu, sigma).

Here, for every iteration the Baum-Welch algorithm will re-estimate the HMM parameters

to a closer value (maximum). The Baum-Welch algorithm is based on a combination of the

forward algorithm and the backward algorithm, which have been implemented before.

69
As mentioned earlier in part 3.3.3.1(b), the values of A, mu and sigma were also

been used as initialization (A_, mu_, sigma_) values. Those values will be used to re-

estimate the transition parameters for the multiple observation sequence left to right HMM.

However, before the process was carried out, the dimensions of the HMM model need to be

checked and determined first, through hmm_chk function as discussed before. Then, the

re-estimation process able to be executed through the functions of hmm_mest as shown in

figure 3.22 below:

Figure 3.22: MATLAB code for the re-estimation of transition parameters

In this case, the matrix X contains all the observation sequences, while the vector st

yields the index, which corresponding to the beginning of each sequence. Thus, X (1: st

(2)-1, :) contains the vectors that relate to the first observation sequence, until X (st (length

(st)), length (X (1, :),:), which corresponds to the last observation sequence. Meanwhile,

the transition parameters are re-estimated in hmm_mest function, where posteriori

distributions of states are returned in gamma (γ). In other hand, note that mix_par also has

been used for re-estimating of HMM parameters (mu_ and sigma_) from posterior state

probabilities. The descriptions of transition parameters re-estimation are describes as

below:

(i) A_: Re-estimate the state transition probability matrix

The Baum-Welch algorithm is used to adjust the model parameter, through

maximization the probability of the model, using the below equation:

70
*  arg max[ P(O |  )] (17)

Here, the re-estimation process of matrix A is quite extensive, due to the use

of multiple observation sequences. Below equation has been used, to

calculate an average estimation with the contribution from all utterances

used in training session.

T 1

exp ected number of transitions from state i to state j


 t (i, j )
aij (i )   t 1
T 1
exp ected number of transitions from state i

t 1
t (i )

(ii) mu_ (μ): Re-estimate mean vector

A new mean value, x_mu (m, n) is used for the next iteration of the process,

where the value of gamma (γ);  t ( j , k ) is used:

 t ( j , k )o t
 jk  t 1
T
(18)

t 1
t ( j, k )

(iii) sigma_ (Σ): Re-estimate covariance matrices

A new covariance value, x_sigma (m, n) is calculated and used for the next

iteration, where the value of gamma (γ);  t ( j , k ) is used:


t 1
t ( j , k )(ot   j )(o t   j ) '
 jk  T
(19)

t 1
t ( j, k )

71
(d) Result – Model of Hidden Markov Model (HMM)

Lastly, after the re-estimation process successfully executed, the HMM model for

the specific utterance need to be save. The model developed represent the specific

observation sequences, i.e: Isolated word, in which it used for recognition purposes later

on. The HMM model obtained, will be discussed in details at chapter 5. The model is

presented with the specific denotation λ = (A_, mu_, sigma_) of MATLAB MAT-file

(matrices 7x14), but here only half of it was shown in figure 3.23 (‘Maaliki yawmiddiini’):

Figure 3.23(a): MAT-file trained model of A_ values (State, i=1-14)

Figure 3.23(b): MAT-file trained model of mu_ (μ) values (State, i=1-13)

Figure 3.23(c): MAT-file trained model of sigma_ (Σ) values (State, i=1-13)

3.2.3.2 Hidden Markov Model Testing/Recognition

Decoding or aligning the acoustic feature sequence requires the prior specification

of parameter from the particular HMM. As mentioned earlier, the HMM model has a role

of stochastic templates, for comparing the observations. Those templates consist of several

72
sentences, which represent different phonemes of Quranic recitations. Each of templates

can be determined and identified through the estimation of HMMs parameter, specified by

a certain database which contained the observations sequences, either supervised or

unsupervised way of learning method.

Based on HMM basic concepts, the parameter  defines the probability measure

for observation sequence O. ie: P (O/  ). This observation sequence O =

O1 O2 O3 . . . OT  need to be compared with a model λ = (pi0, A, mu, sigma), in

order to find the optimal sequence of states q = {q1 q2 q3 . . . . qT} to a given

observation sequence and model. Due to maximize P (q|O,), the suitable algorithm to be

used must be Viterbi algorithm (Rabiner, L.R., 1989). The Viterbi algorithm is used to find

the best single state sequence for the given observation sequence (Rabiner, L.R. & Juang,

B.H., 1993). The testing process was carried out, where the tested utterances are compared

with each model and then, the score value is obtained after each comparison executed.

In this case, the observation sequences of O do not involved in calculation, but the

feature analysis of MFCC of speech samples are corresponding with the word. For

example, a reasonable measure of the similarity for two HMMs model of 1 and  2 , using

the concept of logarithmic distance (defining the distance measure D(1 ,  2 ) ) between two

Markov models 1 and  2 is denoted as:

D(1 ,  2 )  1 / T [log10 P(O2 / 1 )  log 10 P(O2 /  2 )] (20)

73
Where O2  (O1 , O2 ...Ot ) is a sequence of observations generated by model  2 . Basically,

the expression of O2 shown above is the measure of how well model 1 matches the

observations generated by model  2 .

Under the same concepts mentioned above, the above equation (20) has been

implemented with the current research application, mainly in recognizing the Tajweed rules

based on certain ayates of Quranic recitation. Here, the log likelihood of the

word/phoneme itself acts as measurements. The standard Log Likelihood Ratio (LLR) is

calculated as follows:

nLLR  1 / N [log .P(best O)  log .P(2nd best O)] (21)

Here, N is the length of input utterance, log .P(best O) is the largest log likelihood

and log .P(2nd best O) is the second largest log likelihood. The HMM testing is done in

such matter that the particular utterance to be tested is compared with each model, and

output score result is defined for each comparison. The sequence for the test of the Quranic

utterances is based on the following:

Figure 3.24: The HMM sequence of testing/recognition block diagram

74
(a) Initialization

(i) Log (A): State transition probability matrix of the model (Refer to
HMM training)
Load the A_ values (MAT-file) from the trained model λ and calculate the

logarithm value of A. But, by using left-to-right model, the use of logarithm

to zero components values in the A and π can cause problems. It is because;

the zero components will turn into minus infinity. To avoid this problem,

Matlab ‘realmin’ (smallest value) value can be used. It can be shown, based

on the MATLAB code in figure 3.25 below:

Figure 3.25: MATLAB code for ‘realmin’

(ii) mu (μ): Mean matrix from the model (Refer to HMM training)

Load the mu_ (μ) values (MAT-file) from the trained model λ.

(iii) Sigma (Σ): Variance matrix from the model (Refer to HMM training)

Load the sigma_ (Σ) values (MAT-file) from the trained model λ.

(iv) Log (pi0): Initial state probability vector (Refer to HMM training)

The problem was similar likely with Log (A). Thus, a small number is added

to the elements that contain a zero value, such as ‘realmin’ as described in

75
details at part 3.2.3.2 (a) of (i). Note that, the value of π is same for each

model.

(b) Probability Evaluation

(i) Log (P*): The probability calculation of the most likely state sequence. The
max argument is at the last state. Here, log1 has been used to represent Log
P.

(ii) plog1: The state that give the largest Log (P*) at time T is calculated. Later,
the backtracking is used.

(iii) Path: Backtracking state sequence. The optimal state sequence is calculated.

(iv) Log (B): Compute the probability of density values as i (state) from the
previous chapter (HMM training). Here, dens has been used to represent it.

(v) Delta (δ): Maximization of a single path needs for the quantity of δ t (i).

 t (i )  max P(q1q2 ...qt 1 , qt  i, o1o2 ...ot |  ) (22)


q 1 , q 2 ,...., q t 1

The quantity of δ t (i) is probability that observed o1 o2 o3… ot, for the best
path, which ends in state i at time t, for a given model.

(vi) Psi (ψ): The optimal state sequence is retrieved and saved in a vector ψ t (j),

which maximizes δ t+1 (j). While calculating b j (o t), the value of μ, Σ is

gathered from the different models for comparison purposes.

76
The ayates and phonemes of the Quranic recitation is recognized after comparing

the testing model with the help of Viterbi algorithm. This algorithm is used to find the

single best state sequence for the given observation sequence (Rabiner, L.R. & Juang, B.H.,

1993). The following steps for finding the best state sequence are included in the

Alternative Viterbi Algorithm listed below:

1. Preprocessing

~i  log( i ), 1 i  N

a~ij  log(aij ), 1  i, j  N

2. Initialization

Set t = 2;
~
bi (o1 )  log(bi (o1 )), 1 i  N

~ ~
1 (i)  ~i  bi (o1 ), 1 i  N

3. Induction
~
b j (ot )  log(b j (ot )), 1 j  N

~ ~ ~
 t ( j )  bt (ot )  max[ t 1 (i )  a~ij ], 1 j  N
1 i  N

4. Update time

Set t = t + 1;

Return to step 3 if t ≤ T;

Otherwise, terminate the algorithm (go to step 5).

5. Termination
~ ~
P *  max[ T (i )]
1 i  N

77
~
qT  arg max[ T (i )]
1 i  N

6. Path (state sequence) backtracking

a. Initialization

Set t = T - 1;

b. Backtracking

qt   t 1 (qt*1 )
*

c. Update time

Set t = t – 1;

Return to step b if t ≥ 1;

Otherwise, terminate the algorithm.

(c) HMM Recognition Result

(i) Score
The result of the score was obtained from the Viterbi algorithm. From the

calculation value of log1, the probability value of a single path is saved as a

result (output score) for each comparison. Below is the result of the output

score for the ayates ‘Maaliki yawmiddiini’.

Figure 3.26 (a): Output score for the ayates ‘Maaliki yawmiddiini’

78
(ii) Log-Likelihood Ratio (LLR)

From the output score obtained above, the maximization of these probability

values need to determine using Log Likelihood Ratio (LLR). The highest

output score gained is the highest probability that the HMM model (compare

model) has produced for the particular test utterance given, based on the

rank of the threshold value set. In this case, the result of confidence score of

LLR is 0.7253 x 103, which is above the threshold value set > 0.2. The

calculation and result obtained through this method will be discussed in

details in the next chapter 5.

Figure 3.26 (b): Log-Likelihood Ratio (LLR) for the ayates ‘Maaliki
yawmiddiini’

3.3 Summary

This chapter has presented a brief description of technical overview of MFCC and

HMM, and how both algorithms relate each other. It was clearly stated that MFCC handles

the feature extraction process, which then produces features vectors outputs of the Quranic

recitation. These output values are considered as the training set used in the HMM

classification, to train the HMM model. Therefore, HMM works as a classification or

pattern recognition technique that classifies different signals of the Quranic recitation,

based on Log-Likelihood Ratio (LLR) values calculated.

79
The combination of MFCC and HMM have been widely used in speaker

recognition, especially in English language. However, the implementations of the Quranic

recitation with both algorithms (MFCC & HMM) are still considered as a new approach.

Thus, this research studies the possibility of using this combination in Automated Tajweed

Checking Rules Engine for Quranic Verse Recitation. Besides, this chapter also has

presented a detailed methodology of research involved and MATLAB implementation

using MFCC and HMM.

80
CHAPTER 4

DESIGN AND IMPLEMENTATION

4.1 Introduction

This chapter emphasizes on the design and implementation for the development of

Automated Tajweed checking rules engine for Quranic verse recitation. In this system, it

will cover all aspects from various diagrams and parts, which exhibit the logical and

physical designs of this application, as well as algorithms and methodologies involved.

There are a few diagrams shown, which probably include the most relevant diagrams of

Quranic verse recitation recognition based on speech recognition system, such as context

diagram, data flow diagram, flow chart and other diagrams. Finally, this chapter also

provides some snapshots of the Quranic verse recitation recognition graphical user

interface (GUI).

Figure 4.1: Automated Tajweed Checking Rules for Quranic verse recitation context
diagram

81
4.2 Overview of Automated Tajweed Checking Rules Engine

According to the research, the project is mainly focus on the basic of speech

recognition technology, but it will implemented into the different type of application or

languages such as Quranic Arabic. Those different of input content, which had been

implemented in this engine, would probably affect the percentage of accuracy during

recognition process. So, the reliability and effectiveness of the system also depend on

language and system design created. The system is implemented using the Programming

language of MATLAB as programming tools. In this project, the system developed is

divided into two main parts, which are:

1. Engine Development part

2. Content Development part

Figure 4.2: Overview of Automated Tajweed Checking Rules Engine

82
4.2.1 Engine Development Part

In Engine Development part, speech recognition engine is developed due to

extract, store and analyze the parameters of Al-Quran recitation. The Mel-Frequency

Cepstral Coefficient (MFCC) and Hidden Markov Model (HMM) based algorithm is

currently selected for feature extraction and classification (comparison). Here, the process

of speech recording (speech samples collection), features extraction, features training and

pattern recognition were formulates the Quranic verse recitation recognition methodology,

which enhances the design of tajweed checking rules guidelines shown below. The

architecture/block diagram of this part will be shown clearly in this chapter, meanwhile the

process and algorithms involved will be discussed in details in part 3.2.

4.2.2 Content Development Part

For Content Development part, the sample of Quranic recitation is recited by the

certified teacher (Mudarris) and those samples will be stored in PC for analysis purposes.

Relevant GUI interface also developed, in order to provide the user-friendly system of

Automated Tajweed Checking Rules system. The Content Development part will

responsible for all the contents part. The task including, the preparation process of Al-

Quran contents namely Al-Quran transcript and Al-Quran recitation. For Al-Quran

transcript, it is already completed 100% and ready to be used by Engine Development

part. Meanwhile, for Al-Quran recitation, currently each word of the first Chapter of Al-

Quran (Al-Fatihah) has been carefully recited by a certified teacher (Mudarris) and has

been stored in Personal Computer (PC). All the stored files (.wav) will be sent to Engine

83
Development part for integration process with speech processing technology. The Engine

and Content Development part will eventually work together to apply the speech

recognition technology, in order to analyze both recitations (teacher and student) based on

the Rules of Tajweed. If a student recites Al-Quran incorrectly, the system will show errors

on the Graphical User Interface (GUI) and show the playback for correct recitation.

4.3 Tajweed checking rules engine architecture

According to the research, the project is mainly focus on the basic of speech

recognition technology, but it will implemented into the different type of application or

languages such as Arabic Quranic. Those different of input content, which had been

implemented in this engine, would probably affect the percentage of accuracy during

recognition process. So, the reliability and effectiveness of the system also depend on

languages and system design created.

The Quranic Arabic recitation is best described as long, slow pace rhythmic,

monotone utterance (Essa,O., 1998 ; Nelson & Kristina, 1985). The sound of the Quranic

recitation recognizably unique and reproducible according to a set of pronunciation rules of

tajweed, which designed for clear and accurate presentation of the text. The input of the

system is the speech signal and phonetic transcription of the speech utterance. Thus, this

project need to have speaker (input speech sample), features extraction, features training

and pattern classification/matching, which are components that are important for the

Quranic verse recitation recognition formulation of architecture. Here, the main

architecture of Automated Tajweed checking rules for Quranic verse recitation is adhere

84
with the Engine Development part, which has been mentioned earlier in previous part of

4.2.1. This part is divided into 3 main architectures, which are features extraction,

training/testing architecture and lastly the recognition architecture. Figure 4.1 shows the

Automated Tajweed Checking Rules for Quranic verse recitation context diagram that

represent the external look of the system, where the speaker perform their Quranic

recitation via Tajweed checking rules engine and receives the respond from the system

after processing the speech input samples, whereas the training/testing and recognition will

be respond respectively after that. The schematic Tajweed checking rules engine block

diagram is shown in figure 4.3, whereas both training/testing and recognition architecture

is shown in figure 4.4.

Figure 4.3: Block diagram schematic illustrating Tajweed checking rules engine

85
Figure 4.4: Tajweed checking rules engine architecture

Refer to figure 4.3 of Automated Tajweed Checking Rules engine block diagram,

as well as in figure 4.4 of system architecture, show us the process flow that taken part in

this research. The important parts involved in this research can be described as above.

Figure 4.3 show us the overall process in Quranic verse recitation recognition, which

represent in term of block diagram. In this block diagram, 2 distinguished phases have been

represented, which are enrolment or training phase and matching/testing phase, as shown

in figure 4.4. Training and matching/testing phase is totally different process, which had

been executed. In the training phase, each recitor needs to provide the samples of Quranic

recitation, so that the invented engine can build or train a reference model, specifically for

the particular recitor. Meaning that, in this part the researcher only need to train and stored

correct data of certain sourate or Quranic recitation into the database. In the case of speaker

verification process, a specific threshold value also can be computed from the training

samples by researcher. The aims of this action are to provide the correct data due to make it

as a reference for upcoming process for recognition.

86
In other hand, the input speech executed in matching/testing phase is matched with

stored reference model, and thus a decision can be made (recognition). The output data

from Hidden Markov Model (HMM) need to be responded and compared their output data

by referring to the database created during the training process. Meanwhile, at the same

time the system need to act upon the feedback result and then give the answer, either the

output data produces can match the stored data in database or not. If the output data is

slightly different from the stored data in database, the system will assume those output data

(Quran recitation) as false/wrong.

4.4 Data Flow Diagram for Tajweed Checking Rules Engine

In this part, the data flow diagram shows the main process performed by the

Tajweed checking rules engine. There are four mains processes that will be performed for

different tasks as shown in figure 4.5. Those processes include receiving the Quranic

recitation (speech samples), analyzing speech, searching and matching of speech and

producing results and returning the output of result to the recitor.

Recitor represent as a receiver of speech inputs and receiving the speech process

works from the Tajweed checking rules engine. The next process includes analyzing those

speech inputs followed by searching and matching the analyzed of speech samples. Lastly,

this Tajweed checking rules engine will produce and returns the matching results to the

recitor. This system will help and assist the recitors until the process successfully executed

until the final destination.

87
Figure 4.5: Tajweed Checking Rules Engine Data Flow Diagram (DFD)

4.5 Tajweed Checking Rules Engine Flow Chart

Tajweed Checking Rules Engine flow chart emphasizes on the system’s flow of

events. This engine has 5 main stages which include the sampling, segmentation, features

extraction, training/testing and recognition/classification. Figure 4.6 shows those selection

stages as well as process that probably occur for each stage.

88
Figure 4.6: Automated Tajweed checking rules engine for Quranic flow chart

89
Stage 1:

Refer to figure 4.6, sample inputs speech were recorded within the particular time

frame. The speech input for the above utterance, will be segmented due to differentiate the

speech region and non-speech region. Non-speech region were immediately detected and

only the speech regions was allow for further processing.

Stage 2:

It will become as input to the phoneme segmentation module, where the basic level

of segmentation is performed. After segmentation process, MFCC feature extraction

module will extract those speech signals, which is extensively used as a feature vector for

speech recognition systems.

Stage 3:

The next process is HMM classification as well as phoneme classification process.

HMM classification (recognition) represent for training and testing as well, which mainly

used for tajweed rules checking process. In training part, a set of training speech is used

due to construct model for each word/phoneme, regardless to recitors.

Stage 4: (Checking tajweed character/database)

In developing database of the engine, the HMM training process needs to be

executed. The training process involved, mainly include the task involved with Content

Development part mentioned earlier in part 4.2.2. Here, the recitors need to train/repeat a

set of word/phoneme or phrase of the Quranic recitation, and adjusting its comparison

algorithm to be match with initial training data set. Each word or phoneme from the

90
vocabulary will be connected to Hidden Markov Model, using the values obtained from

HMM modeling, such as A, mu and sigma. The values obtained from HMM Modeling (A,

mu and sigma), will be used as reference patterns and stored as database.

Stage 5: (End of utterance?)

Each line of the Quranic recitation (represent in array value of input sample) based

on ayates is arranged in sequence, line by line in MATLAB array editor. Based on the set

of phonemes arrangement on that array editor, the value of (A, mu and sigma) will be

obtained from HMM modeling (HMM training), based on the line specified for each ayates

in certain sourate (Al-Fatihah). The values for each line in particular phonemes of ayates,

will be used as reference patterns, where the looping process of new inputs of the Quranic

recitation will be executed line by line, alternately work altogether with reference patterns

until the looping process ended (based on line parameter set) and completed.

Stage 6: Result (Response)

The values obtained at this stages, will be known as recognition results. The results

were obtained after a real time acquisition of the Quranic recitation, a speech processing

stage and HMM modeling executed. Then, the process continues with recognition

procedure, where the values are compared with all codebook models (reference patterns),

and due to get the Maximal Likelihood Ratio. Those maximal values were only

corresponds to the recognize word only.

91
4.6 Tajweed Checking Rules Graphical User Interfaces

The automated Tajweed Checking Rules engine which had been design here,

become crucial to be implemented. The system designed, must be flexible and user friendly

to be used, and also easy to be visualized by the user. Thus, there is a way that probably can

be used for this projects’ development, through the Graphical User Interface (GUI).

In this part, both logical and physical aspect of Tajweed checking rules engine were

presented and visualized using the Graphical User Interface (GUI). Besides, the

understanding of the Tajweed Checking Rules engine functional requirement also described

through the graphical representations.

Figure 4.7: Automated Tajweed Checking Rules Engine for Quranic verse Recitation
Graphical User Interface

92
Figure 4.8 shown below were the list of the selected item for particular ayates that

have been recorded before and need to be load into the engine, for further processing for

analysis and matching process.

Figure 4.8: Load the wave file of input speech sample from sourate Al-Fatihah

After the input speech sample has been selected and load into the system, the

process continued with analyzing process for further processing. The process in analyzing

part is mainly to extract the features extraction from the sourate Al-Fatihah of input speech

sample, due to obtain the features vector. The GUI visualization of this part can be seen

through figure 4.9 and figure 4.10.

93
Figure 4.9: Analyzing process of sourate Al-Fatihah using MFCC (Started)

Figure 4.10: Analyzing process of sourate Al-Fatihah using MFCC (Finished)

94
Speech Sample of Quranic Recitation
1

0.5

Amplitude 0

-0.5

-1
0 0.5 1 1.5 2 2.5 3 3.5
Time [sec]
Frequency [kHz]

6 3

2
4
1
2
0
0
0 0.5 1 1.5 2 2.5 3 3.5
Time [sec]

Figure 4.11: The input speech sample and spectrogram graph for ‘Bismillah’ utterance

Then, after the analyzing process successfully completed, the process will proceed

for matching analysis. Meaning that, in this part the Tajweed checking process will be

executed. If the Quranic recitation was pronounced incorrectly, this engine will notify the

user (recitor) any false word/s for the certain ayates involved. The engine will show errors

on the Graphical User Interface (GUI) for any incorrect recitation of Al-Quran. Next, the

engine will guide the user (recitor) towards the correct ayates that need to be followed or

recite orderly with playback the correct recitation. This description can be seen through

these GUI shown in figure 4.12, figure 4.13, figure 4.14 and figure 4.15.

95
Figure 4.12: The incorrect recitation of ‘Bismillah’ utterance (1st mistake/notification)

Figure 4.13: The incorrect recitation part involved and Tajweed rules

96
Figure 4.14: The incorrect recitation of ‘Bismillah’ utterance (2nd mistake/notification)

Figure 4.15: The incorrect recitation part involved and Tajweed rules

97
Figure 4.16: The correct recitation of ‘Bismillah’ utterance

But, in other hand, if the Quranic recitation is correct, in term of its recitation as well its

tajweed rules, the engine will give a result and shown the match based on the ayates,

recited by the user (recitor). It can be visualized based on the figure shown below:

Figure 4.17: The notification of correct recitation of ‘Arrahmaanirrahiim’ utterance

98
Figure 4.18: The correct recitation of ‘Arrahmaanirrahiim’ utterance

4.7 Summary

This chapter presented for both logical and physical aspects of Automated Tajweed

Checking Rules for Quranic verse Recitation and provided a visualization of the main

graphical user interface of the engine. It also provided an understanding of the Automated

Tajweed checking rules for Quranic verse recitation, functional requirements through

graphical representations.

99
CHAPTER 5

EXPERIMENTAL RESULTS AND DISCUSSION

5.1 Introduction

In this chapter, relevant experimental result will be shown based on the findings of

this research, and discussed in details based on the system chronology as described in

methodology part in chapter 3 and chapter 4. The aims of this task is to clearly show the

experimental results starting from the collection of speech samples, followed by features

extraction, then features training and lastly features matching/testing. The last part of this

method of features matching/testing is the main part that evaluates the performance of

Tajweed checking rules engine focus on recognition rate.

5.2 Speech Samples Collection (Recording process)

In this section, the main process concern is focus towards the collecting of speech

samples from 5 different speakers (recitor) through recording process. Each of distinct

word (ayates in sourate Al-Fatihah) will be recorded, and those speech samples were saved

for further processing. The numbers of speech samples collected were 52 words (ayates)

and 82 probable samples of phonemes for those ayates in different samples of Quranic

recitation. These samples will be used in training of Hidden Markov Model (HMM) and

also in the testing part. Speech samples were recorded in a constraint environment, where 5

100
selected speakers (recitors) were choose and highly trained in Quranic recitation based on

the ‘Tajweed rules’. Here, the first chapter of Al-Quran (Al-Fatihah) were recited, with

approximately recited 4 seconds (time length) each, in ‘.wav’ of file format. Table 5.1

shows the summary of the collected speech samples of Sourate Al-Fatihah.

Table 5.1: Except from the dictionary of Sourate Al-Fatihah

The word in the dictionary The utterances The ayates in Al-Quran


(Wave file assigned) (Phonemes)
Bismi
Bismillahirrahmanirrahim Llahii
(Bismillah.wav) Rraohimani
Rraohiiim
Alhamdu lillahi rabbi Allhamdu
alAAalameen Lillahhirabbil
(fatihah1.wav) A’alamiinna
Arrahmaanirrahiim Alrrahmani
(fatihah2.wav) Alrraheemi
Maaliki
Maalikiyawmiddiini Yawmi
(fatihah3.wav) Alddeeni
Iyyaka
Iyyakana’Abudu waiyyaka naA’Abudu
nastaeen waiyyaka
(fatihah4.wav) nastaAAeenu
Ihdina
Ihdinaassiratholmustakiim Alssiratho
(fatihah5.wav) Almustaqeema

SiraathollazinaAn’amta’Alai Siratho

101
him Allatheena
ghayrillmaghdoobi’Alaihim An’Aamta
waladdholeen ‘AAalayhim
(fatihah6.wav Ghayri
& fatihah7.wav) Almaghdoobi
‘AAalayhim
Wala
Alddhalleena

For phonemes templates, each ayates of sourate Al-Fatihah sound files were segmented

into individual files, by cutting the desired part or specified region (Region of Interest)

only, using the GoldWave Editor. The parameter of these inputs is equally set up into the

same parameter, due to avoid any inconsistency value of incoming results. The summary of

the collected phoneme from the speech samples of 8 ayates is listed in table 5.2.

Table 5.2: Summary of the Total Collected Speech Samples for each Ayates

Ayates in wave file No. of Collected of Speech Samples


Bismillah.wav 17 Samples
fatihah1.wav 9 Samples
fatihah2.wav 6 Samples
fatihah3.wav 7 Samples
fatihah4.wav 11 Samples
fatihah5.wav 9 Samples
fatihah6.wav & 11 Samples
fatihah7.wav 12 Samples
Total No. of Speech
Samples 82 Samples

102
5.3 Result of Feature Extraction

In this part, the experimental result of MFCC (Mel-Frequency Cepstral Coefficient)

algorithm for feature extraction has been presented. The process of feature extraction was

applied to all 52 and 82 collected of Quranic recitation of speech samples. Here, MFCC

cepstral coefficients values will be obtained from each input of speech sample, which then

been transformed into the output of features vector format. Based on the result, the column

data end up was 398 columns with 13 features vector (12 coefficients + 1 log energy).

5.4 Result of Features Training

After feature extraction process executed, the recognition process will compare the

extracted features with its reference model. This reference model is developed, after the

enrolment or training phase had been successfully implemented. In this case, the reference

model (stored model in database) used consist of 2 types of models, which are Word based

Model and Phoneme based Model. The reference model for phoneme based model is

totally differs from word based model, where speech features that have been extracted are

directly compared to the word templates. Here, each of word templates in direct matching

model were stored as a vector of features parameters. Word based model has been used as a

first model, while phoneme based model been the second model used as template matching

at testing/recognition part. The phoneme based model was phoneme like template

matching, where the word templates are stored as phoneme like template parameters. For

the phoneme based model, it will be discussed in details at Tajweed checking rules

database and testing – phoneme like template, later on.

103
Here, the experimental result of performing the Hidden Markov Model (HMM) for

features training has been completely done. Features vector produced by Mel Frequency

Cepstral Coefficient (MFCC) will be combined due to create the database that serves as

HMM model, specifically used for provide the template matching while training the data.

Each of distinct words of features vectors will be combined altogether due to create a

database for that particular distinct word, where the value of   ( A, pi 0, mu , sigma) is

evaluated and stored in database. Table 5.3 shows the result of creating the HMM model

for the particular recitation of Al-Quran during the enrolment or training phase in (.mat)

format. Each distinct word in the dictionary was trained against the initial template of

HMM model mentioned earlier for 8 training iterations.

Table 5.3: Template Data of HMM Model for Collected Quranic Recitations

The word in the dictionary Wave file HMM Model HMM Model
assigned (Word/ayates like (Phonemes
template) like
template)

Bismillahirrahmanirrahim Bismillah.wav Bismillah_


model.mat

Alhamdu lillahi rabbi fatihah1.wav ayat1_model.


alAAalameen Alfatihah_model. mat
mat

Arrahmaanirrahiim fatihah2.wav ayat2_model.


mat

Maalikiyawmiddiini fatihah3.wav ayat3_model.


mat

104
Iyyakana’Abudu waiyyaka fatihah4.wav ayat4_model.
nastaeen mat

Ihdinaassiratholmustakiim fatihah5.wav ayat5_model.


mat

SiraathollazinaAn’amta’Ala fatihah6.wav
ihim & ayat6_model.
ghayrillmaghdoobi’Alaihim fatihah7.wav mat
waladdholeen

As mentioned earlier, the system contains 2 separate template of HMM model from

the training corpus. The first model stands for the Word (ayates) Template, while the

second model is for the Phoneme-Like Templates. The training corpus used 2 tests to

compose the samples of Quranic recitation. Those tests can be seen in part 5.5 and will be

discussed later. From the corpus, 82 samples of Quranic recitation phonemes like templates

are produced and converted into phoneme strings using the Quranic pronunciation rules.

Those templates were particularly taken from 8 words (ayates) of sourate Al-Fatihah, then

those template will manually arranged into 7 files of model and stored into the database as

HMM model (.mat) shown in table 5.3. These models not just recognize the phonemes but

also checks for the tajweed rules that govern the recitation of Al-Quran. Each of

experiment executed, both the training and word templates uttered are from the same

speaker (recitor).

105
5.4.1 Tajweed Checking Rules Database

In the database, it contains of 8 ayates of Sourate Al-Fatihah with 52 samples of

utterances, meanwhile another 28 phonemes from those ayates with 82 samples of input

phonemes. This engine scans those input of Holy Quran Ottoman sound and text, searching

for symbols and features, where it will generates its code, pronunciation status, as well as

acoustic characteristic at each probably pronounced character (such as voicing, place of

articulation, nasalization and aspiration). Then, the engine will analyze those codes and

characteristics, as well as generates the corresponding correct phonetic transcription,

according to the Quranic recitation rules and their exceptions. The HMM

enrolment/training part, will gathers all the information, due to develop phoneme based

template (recitation pattern) at probable pronunciation locations. These pronunciation

patterns are used for matching with pronunciation variants rules, during the matching and

testing process. Here, the engine database contains 10 rules of pronunciation errors in the

Quranic recitation and those rules were presented the way of these recitation errors

hypotheses as listed below:

Table 5.4: The Tajweed Pronunciations rules in Sourate Al-Fatihah

Word/Sentence/Phoneme Ahkam al-Tajweed

, , , , , , Mad Asli Mutlak

, , ,
Idgham Syamsi : alif lam
, , ,
meet ra;,alif lam meet
dal; alif lam meet syad;
,
alif lam meet zai;

106
Mad ‘arid Lissukun:
, , , , Letter of mad has been
Waqf (Stop)
, ,
Izhar Syafawi: min
sukoon meet dal;min
sukoon meet ta’;min
, , , sukoon meet ghim; min
sukoon meet wau
Izhar Qamari: alif lam
meet ‘ain
Izhar Halqi, nun sukoon
meet ‘ain

Beside the tajweed rules listed above, other 4 additional Ahkam al-Tajweed also checked

such as Iqlab, Idgham Bilal Ghunna, Idgham Ma’al Ghunna and Ikhfa’ Haqiqi.

5.5 Result of Features Matching/Testing

The experimental results of performing the MFCC algorithm for features extraction

from the Quranic recitation of speech samples and then, matching/testing against the

trained HMM (Hidden Markov Model) model of data templates, using the same

classification of HMM method. As mentioned earlier in part 5.4, those data templates also

known as template matching, which is in a form of pattern recognition where each

word/ayates or phonemes is stored as a separate templates (phoneme-like templates and

word (ayates) template). Both templates were used as reference model (template matching),

purposely for recognition task. In this task, any input that passing through this engine will

107
be compared with the stored template, and any template that most closely match with the

incoming speech pattern is identified as recognized word (ayates) or phrase/phoneme.

The automated Tajweed checking rules engine will act upon any Quranic recitation,

whenever it receives the input speech signal because any speech that passing through the

system will give an output score and cause the engine to make judgements. Thus, the score

value measuring the confidence of a recognized word needs to find out. Besides, those

ayates and phonemes has been classified under 2 different probabilities, either In

Vocabulary (IV) of data or Out of Vocabulary (OOV) data, due to ensure that the engine

compatible in checking the tajweed rules. The basic idea for separating the IV and OOV

phonemes/words are the likelihood difference between the best and 2nd best result of IV

input are smaller than those of OOV input, because of unmatched model of OOV input. As

mentioned earlier in chapter 3, the standard Log Likelihood Ratio (LLR) and augment LLR

are used, using the below equation:

nLLR  1 / N [log .P(best O)  log .P(2nd best O)] (1)

N, is the length of input utterance, log Pbest O  is the largest LLR and log P2ndbest O  is

the second largest LLR.

According to Yongwon, J. et al. (2001), log likelihood of the word itself is not

appropriate to acts as measurements, due to the setting the threshold value. Thus, the above

equation (1) is used in their research. In this case, the equation (1) shown above need to

modify, in order to improve the reliability. If the input utterances of IV word/phoneme

108
changed a little bit, the recognition result obtained would not change too much, due to the

relative large likelihood differences between best and 2nd best of the results. But, in this

case of OOV word/phoneme, high probably that the result of changed input may be

difference from the original input. Because of that, the perturbed input need to be employed

in order to improve the robustness of confidence score. Here, several methods have been

applied to perturb the input features vector, such as:

coef1 = k1*coef;
coef2 = coef – k2*mc;
coef3 = coef – k3*Oc;

Based on the above formulas, coef: feature vector, mc: mean vector of feature vector for

input speech and Oc: standard deviation vector of feature vector for input speech. In

other hand, k1, k2 and k3 are the constant values which need to be adjusted, so that the

percentage of the divergence between the recognition results of original and perturbed input

of features vector will remain < 10%, especially for the IV word/phoneme. After the coef2

been perturbed, if the recognition result is change, then a certain values need to be added to

log likelihood (LLR). It can be done as follows:

LLRA = LLR + k; if Wo = Wp

LLRA = LLR; if Wo ≠ Wp

Formula; Wo: recognized word from the original input feature vector. Wp: recognized

word from the perturbed input feature vector. Here, the threshold value for the LLRA is set

by training the IV input and OOV input. After that, the LLRA result will be obtained, after

109
the testing process had successfully executed. If the value of LLRA > threshold, it is

considered as IV word/phonemes. In other hand, if LLRA < threshold, it will considered as

OOV word/phonemes. But, this threshold setting arguments can be changed, depend on the

MATLAB program developed. After the implementation of LLRA had been carried out, the

result obtained, did not give the perfect result, especially for the recognition of Tajweed

Checking rules, which been presented in term of phonemes. LLRA was presented and has

been used by Yongwon, J. et al. (2001) before, but for the single or direct word recognition.

In this case, LLRA is not suitable to be implemented to the phonemes input, because

the LLR value for IV phoneme and OOV phoneme are almost the same. Thus, another

alternative of LLR has been made, through the implementation of the LLR ratio with the

largest LLR value, presented as below equation:

Diff_ratio = [log .P(best O)  log .P(2nd best O)] / log .Pbest O  (2)

There are 2 tests performed on this system, in order to evaluate the system

performance. As mentioned earlier in part 5.4, in every experiment done, both training and

word templates uttered are from the same speaker. Table 5.6 and table 5.9 show the overall

result of the two testing sets, respectively.

5.5.1 Testing - Word (ayates) Like-Template

In this part, the LLR threshold value is -1100 with the different ratio value of 0.2. If

the value of LLRA >-1100, it is considered as IV word, meanwhile, if LLRA < -1100, it will

considered as OOV word. Moreover, result obtained after the implementation from the

110
equation (2), gives the values of diff_ratio of IV almost bigger than 0.2, while most of

OOV input give the values less than 0.2. It can be seen through the result obtained based on

table 5.5 below, where the value highlight with red color represent the LLR value above

0.2 for the IV words. Meanwhile, other LLR values highlight with blue color, represent the

OOV words (LLR values less than 0.2). Meaning that, all 8 ayates of Sourate Al-Fatihah

shown below, were categorized under the IV words. In relation with the application of this

engine, whenever any of input claimed as OOV word/ayates, there is notification of the

incorrect recitation of Sourate Al-Fatihah, as well as the notifications of Tajweed Rules

references, made for evaluation purposes. Whenever an IV input identified as an IV, there

is a correct IV notification detection, altogether with the ayates of Sourate Al-Fatihah

identified with the correct recitation will be heard all along.

Table 5.5: Result of Likelihood Ratio (LLR) for 8 recitations of speech samples (1.0 x 103)
Sequence x1 x2 x3 x4 x5 x6 x7 x8

logP(X│Θ1) 0.2112 -3.9878 -4.4179 -4.6103 -5.1018 -5.4842 -5.6575 -5.7628

MLM 1 3 6 7 8 5 2 4

logP(X│Θ2) 0.4394 -4.7675 -4.8948 -4.9438 -5.1501 -5.2021 -5.5128 -5.8265

MLM 2 7 8 5 6 1 4 3

logP(X│Θ3) 0.2472 -3.6302 -3.9481 -4.5353 -4.8883 -5.0468 -5.1351 -5.1712

MLM 3 1 6 7 2 8 4 5

logP(X│Θ4) 0.7253 -3.9347 -4.1251 -4.2471 -4.4244 -4.5630 -4.6807 -4.8629

MLM 4 6 1 5 7 3 8 2

logP(X│Θ5) 0.2659 -4.8868 -5.7913 -5.9782 -6.6163 -7.6434 -7.6572 -8.4972

MLM 5 6 7 1 8 3 4 2

111
logP(X│Θ6) 0.2667 -4.4097 -4.8590 -4.8904 -4.9843 -5.3690 -5.8303 -7.7457

MLM 6 7 5 8 1 3 4 2

logP(X│Θ7) 0.6612 -4.6829 -5.1914 -5.3106 -5.4521 -6.4570 -6.9848 -7.8626

MLM 7 6 8 1 5 3 4 2

logP(X│Θ8) 0.8678 -4.3930 -4.6584 -4.8508 -5.1978 -5.8682 -6.4164 -7.1213

MLM 8 1 7 5 6 4 3 2

MLM = Most Likely Model

Table 5.6: Test result for 8 recitations of speech samples (ayates of sourate Al-Fatihah)

%
Ayates/Articulation # of Correct Wrong % Word
utterances Accuracy error
rate

5 5 0 100 0

5 5 0 100 0

7 7 0 100 0

6 6 0 100 0

9 8 1 88.89 11.1

9 9 0 100 0

6 4 2 66.67 33.33

112
5 4 1 80 20

Total 52 48 4 91.95 8.05

For the first test, 8 ayates of sourate Al-Fatihah have been tested and the result of

the test is shown above at table 5.6. In this experiment, the extracted features of 8 ayates of

the Quranic recitation was directly compared to the word templates (Word based Model).

As a result, the test result on the training data is perfectly reached at 91.95%, which means

only 4 errors with 8.05% of Word Error Rate (WER). It is better than the result of the

previous researches, carried out by Ehab, M. et al. (2007) and Anwar, M.J. et al. (2006),

with the accuracy rate of recognition are 85% and 89% respectively.

5.5.2 Testing – Phonemes-Like Template

As mentioned earlier, the experiment for phonemes-like template is carried out, in

order to check the Tajweed rules for the particular ayates of Quranic recitation given. Note

that, the threshold value for phonemes like template experiment is -500, with the value of

different ratio is 0.01. However, the threshold setting value is totally different from the

previous testing process which, if the value of LLRA >-500, it is considered as OOV

phonemes, meanwhile, if LLRA <-500, it will considered as IV phonemes. If the particular

utterance has been detected as OOV phoneme, the identification and verification process of

pronunciation rules error (Tajweed rules) will be executed. Meaning that, the pronunciation

for the particular Quranic recitation is detected as false/incorrect. Table 5.7 and table 5.8

shown below are the experimental results for the two sample phonemes of ‘Bismillahir

<rahmaanir> rahimi” and “Bismillahir rahmaanir <rahiimi>” respectively, for a better

understanding.

113
Table 5.7: Comparison between correct and incorrect Tajweed rules for ayates “Bismillahir
<rahmaanir> rahimi”
Correct Recitation Incorrect Recitation

Ayates

The
utterances Bismillahir RAHMAANIR rahimi Bismillahir RAHMUUNIR rahimi
(Articulation)
Score: Score:
Output Score 1.0e+003 * 1.0e+003 *
Columns 1 through 3 Columns 1 through 3
-1.5521 -1.3030 -2.1808 -1.2703 -0.9670 -2.2708
Columns 4 through 6 Columns 4 through 6
-2.2018 -0.7968 -1.1091 -1.4974 -0.8738 -0.9279
Columns 7 through 9 Columns 7 through 9
-0.6398 -0.6541 -0.5685 -0.5621 -0.7777 -0.8362
Columns 10 through 12 Columns 10 through 12
-0.8995 -1.0463 -0.8684 -0.7123 -0.9958 -0.7422
Columns 13 through 15 Columns 13 through 15
-1.1624 -0.6604 -1.0446 -0.9294 -0.4929 -1.0265
Columns 16 through 17 Columns 16 through 17
-0.6033 -0.7845 0.0544 -0.9155
LLR: LLR:
Log-likelihood 1.0e+003 * 1.0e+003 *
(LLR) Columns 1 through 3 Columns 1 through 3
-0.5685 -0.6033 -0.6398 0.0544 -0.4929 -0.5621
Columns 4 through 6 Columns 4 through 6
-0.6541 -0.6604 -0.7845 -0.7123 -0.7422 -0.7777
Columns 7 through 9 Columns 7 through 9
-0.7968 -0.8684 -0.8995 -0.8362 -0.8738 -0.9155

114
Columns 10 through 12 Columns 10 through 12
-1.0446 -1.0463 -1.1091 -0.9279 -0.9294 -0.9670
Columns 13 through 15 Columns 13 through 15
-1.1624 -1.3030 -1.5521 -0.9958 -1.0265 -1.2703
Columns 16 through 17 Columns 16 through 17
-2.1808 -2.2018 -1.4974 -2.2708
Tajweed
Rules - Mad Asli Mutlak

Table 5.8: Comparison between correct and incorrect Tajweed rules for ayates “Bismillahir
rahmaanir <rahiimi>”
Correct Recitation Incorrect Recitation

Ayates

The
utterances Bismillahir rahmaanir RAHIIMI Bismillahir rahmaanir RAHUUMI
(Articulation)
Score: Score:
Output Score 1.0e+003 * 1.0e+003 *
Columns 1 through 3 Columns 1 through 3
-2.0779 -1.6710 -1.9139 -1.8007 -1.6138 -2.5081
Columns 4 through 6 Columns 4 through 6
-2.0321 -1.2066 -1.1630 -2.7402 -0.9999 -1.0721
Columns 7 through 9 Columns 7 through 9
-1.1592 -1.2839 -0.8137 -0.6768 -0.7334 0.0782
Columns 10 through 12 Columns 10 through 12
-1.5029 -1.6649 -1.5198 -1.1591 -1.3392 -1.0342
Columns 13 through 15 Columns 13 through 15
-1.6956 -0.8598 -1.4082 -1.4091 -0.7923 -1.4355

115
Columns 16 through 17 Columns 16 through 17
-1.1358 -1.7441 -0.6912 -0.9625
LLR: LLR:
Log-likelihood 1.0e+003 * 1.0e+003 *
(LLR) Columns 1 through 3 Columns 1 through 3
-0.8137 -0.8598 -1.1358 0.0782 -0.6768 -0.6912
Columns 4 through 6 Columns 4 through 6
-1.1592 -1.1630 -1.2066 -0.7334 -0.7923 -0.9625
Columns 7 through 9 Columns 7 through 9
-1.2839 -1.4082 -1.5029 -0.9999 -1.0342 -1.0721
Columns 10 through 12 Columns 10 through 12
-1.5198 -1.6649 -1.6710 -1.1591 -1.3392 -1.4091
Columns 13 through 15 Columns 13 through 15
-1.6956 -1.7441 -1.9139 -1.4355 -1.6138 -1.8007
Columns 16 through 17 Columns 16 through 17
-1.7441 -1.9139 -2.5081 -2.7402

Tajweed - Mad ‘arid Lissukun: Letter of mad


Rules has been Waqf (Stop)

According to LLR result obtained for both table 5.7 and table 5.8 shown above, the

value highlighted with red color represents the IV phoneme (LLR value less than 0.01).

Meanwhile, the LLR value highlight with blue color represents the OOV phoneme, with

the value above 0.01. In this case, two different phonemes from the ayates “Bismillahir

<rahmaanir> rahimi” and “Bismillahir rahmaanir <rahiimi>” have been successfully

tested. The result obtained for both phonemes are -0.5685 and -0.8137, which are below

the LLR threshold value (LLR<-500) and been classified under the IV phonemes (Correct

recitation). In other hand, the LLR values highlight with blue color (0.0544 and 0.0782)

were categorized as OOV phonemes (Incorrect recitation), since these values were

116
specified above the LLR threshold value (LLR >-500). For the first phoneme, the incorrect

recitation of tajweed pronunciation error is claimed as ‘Mad Asli Mutlak’, where the

phoneme of need to pronounce as ‘rahmaanir’ and not ‘rahmuunir’, with

2 haraakat of recitation. Besides that, the pronunciation for the 2nd phoneme also has been

detected as false regardless to the tajweed rule, which claimed as Mad ‘arid Lissukun (letter

of mad has been Waqf (Stop)), since the phoneme of need to be pronounced

as ‘rahiimi’ and not ‘rahuumi’.

The results obtained from sample phonemes, shown in table 5.7 and table 5.8 were

2 out of 28 samples of Quranic recitation phonemes of the overall result, in which

purposely to check the Tajweed Rules in this sourate. In this experiment as shown at table

5.9 below, features vector from the input phonemes was perfectly match the phoneme

based template with the percent accuracy reached to 86.41%, with 14.34% of error rate

only. Although the percent accuracy in this experiment quite smaller compared to previous

result in table 5.6, but the result is still under the expectation. It is because, this experiment

involved with a large amount of samples, particularly for testing purposes. From the

experiment, the current method used is much simpler than LLRA, which only need to

calculate the perturbed value only with the easier calculation.

117
Table 5.9: Test result for 28 recitations of speech samples (Phonemes)
# of % %
Ayates Phonemes utterances Correct Wrong Accuracy WER
Bismi
Bismillah.wav Llahii 17 16 1 94.12 5.88
Rraohimani
Rraohiiim
Allhamdu
fatihah1.wav Lillahhirabbil 9 8 1 88.89 11.1
A’alamiinna
Alrrahmani
fatihah2.wav Alrraheemi 6 6 0 100 0
Maaliki
fatihah3.wav Yawmi 8 8 0 100 0
Alddeeni
Iyyaka
fatihah4.wav naA’Abudu 11 8 3 72.72 33.3
waiyyaka
nastaAAeenu
Ihdina
fatihah5.wav Alssiratho 9 8 1 88.89 11.11
Almustaqeema
Siratho
Allatheena
fatihah6.wav An’Aamta 12 8 4 66.67 33.33
‘AAalayhim
Ghayri
Almaghdoobi
‘AAalayhim
fatihah7.wav Wala 10 8 2 80 20
Alddhalleena
Total 28 82 70 12 86.41 14.34

118
The figure 5.1 and figure 5.2 shown below are the bar chart represented for the

percentage of Accuracy and Word Error Rate (WER) for ayates and phonemes, which have

been summarized from the overall result of the previous experiments for both ayates and

phonemes.

Percentage of accuracy (%)

Figure 5.1: Percentage of accuracy for recognition rate (Ayates & Phonemes)

119
Ayates/Phonemes (.wav)

Figure 5.2: Percentage of Word Error Rate (WER) for ayates & phonemes

Based on the figure 5.1, there are 2 ayates achieved the 100% accuracy for both

ayates and phoneme, which are fatihah 2 and fatihah 3 (ayates 2 & 3). Hence, the WER for

these ayates will be remaining 0%, without any error detected. This can be seen through the

bar chart shown in figure 5.2. This case probably because, those ayates were the short

sentences and phonemes, which avoidable from any complexity during the matching and

recognition process took place. Meanwhile, fatihah 6 (ayates 6) achieved the smallest

percentage of accuracy, in which these ayates and phonemes only reached 66.67% of

accuracy value, but the percentage of WER value reached into the highest percent of

33.33%. The rationale behind this result, probably because of the complexity in

pronouncing this ayates, as well as the difficulties in matching and recognizing the exact

utterance properly.
120
5.6 Summary

The overall process conducted in this research is shown clearly in previous chapter

4 of Data Flow Diagram (DFD) and flowchart for Tajweed Checking Rules Engine. Based

on this DFD and flowchart, the process involved in this research, were clearly seen and

justified.

The experimental results obtained had fulfilled the targeted criteria and goals that

had been set and planned earlier, although there are some limitations of unexpected

problems occurred while recording and running the simulations process.

121
CHAPTER 6

CONCLUSION AND FUTURE ENHANCEMENT

6.1 Introduction

In this chapter, the process involved in this project will be discussed briefly and any

recommendation for the future enhancement for the overall research will be presented, due

to make this system more eagerly efficient and sophisticated. It also highlights the

significance and contributions of this research. In addition, it will explain elaborately about

the weaknesses and also the strength of this research, as well as propositions for the

improvement and future works. Lastly, at the end of this chapter the entire research will be

summarized shortly in details.

6.2 Significance and Contributions of Tajweed Checking Rules engine for Quranic

verse recitation

The significance(s) of this research project are:

(i) Provides alternative way to learn Al-Quran recitation for creating knowledgeable

society.

(ii) Facilitate students in reading Al-Quran with their own pace and time. This engine

also can be as a self learning tool for working adults with time constraints to learn

Al-Quran.

122
(iii) Capabilities of the system created, due to check the tajweed rules based on stored

database.

(iv) Enhance a better skills and understanding of Quranic reading with the faster way.

(v) Promote Quranic literacy and explore new approches in signal processing

technology.

(vi) Support the Quranic learning process, especially in j-QAF educational programme.

A complimentary school program that utilizes current ICT development and j-QAF

curriculum to assist in reciting Al-Quran using interactive learning technique.

(vii) Encourage Muslims to advance their recitations as well as new converted Muslims

and students to learn and practice Islam in a more convenient and effective way.

6.3 Observations on Weaknesses and Strengths

Different observers or researchers have different opinions and views, while testing

and evaluate this system. Moreover, that is normal when the particular system invented also

have its own strengths and weaknesses, as mentioned below:

6.3.1 Strengths

Generally, reliable speech recognition is a hard problem, which required a

combination of many techniques. However, in this research the alternative methods

implemented able to achieve the targeted objectives with its own strengths. The strength(s)

of this research project are:

123
(i) In this modern and technological era, speech interaction system is believed able to

achieve the users’ targeted objectives in a very easy and fast manner. Besides, the

interactive speech recognition system will ease and fasten the communication

process.

(ii) The automated Tajweed Checking Rules engine for Quranic verse recitation will

enable the user to recite Al-Quran through the MATLAB Graphical User Interface

(GUI), hearing the correct recitation and hence, determine the proper way to recite

Al-Quran. As a result, personal improvement in reading Al-Quran can be easily

determined in real time basis without any delay.

(iii) This interactive engine is a self learning educational tool that can support the

students in j-QAF learning, especially in learning Al-Quran (Tasmik & Khatam al-

Quran model). Besides, this engine also able to put some ease to the j-QAF teachers

while teaching the Quranic syllabus.

(iv) This project will allow a chance for more researchers to get involved to the project

done by University of Malaya students, since the students may able to refer to this

project particularly for their own benefits in developing the system with the same

nature.

(v) Allow the interchange of ideas and collaboration between 2 or more faculty

(Inter-faculty) researchers or agency, due to produce a magnificent product,

for own benefits of Muslim community.

124
(vi) The engine developed, shows the promising results in which almost the exact match

of recitors’ preferences and entries.

(vii) This research shown that, the combination of MFCC feature extraction and HMM

classification managed to work well and able to produce a magnificent results, in

Quranic speech recognition.

(viii) Most challenging task in this research is to implement Al-Quran with speech

recognition system, altogether with the engine capability in checking the tajweed

rules. But, this engine able to achieve recognition rate that exceeded 91.95%

(ayates) and 86.41% (phonemes), which indicates that the engine was successful.

6.3.2 Weaknesses

Throughout these years, the research conducted also facing some problems and

difficulties, due to some limitations and weaknesses in speech recognition research area.

The weaknesses of this research project are:

(i) The implementation of Quranic with speech recognition system is not an easy job to

be developed and to be implemented to all chapters of Al-Quran, since this

technology is still new in the market. The software and hardware required might be

unavailable yet in the market.

125
(ii) Most of past research executed and implemented to English language only.

Thus, the implementation of Quranic in speech recognition system is still in

earlier stages, which need to be improved a lot.

(iii) Speaker recognition is a difficult task. It is very hard to get the exact match with

high accuracy rate in many cases especially during the training and testing sessions.

It is because, during these sessions it can be greatly different due to many fact such

as, human voice change with time, health conditions (e.g. the speaker has a cold),

recording environments and others.

6.4 Future Research

According to the research conduct, the engine developed showed the promising

results although it was only tested against the small Quranic chapters (ie: Sourate Al-

Fatihah). But, there are still in earlier stage of research, which need the proper

attention/action and improvement due to make this engine more compatible and useful to

the end users. The Quranic implementation in speech recognition system, especially in

checking the Tajweed rules always be a new developments in this technology in which

allow more researchers and creativity to get involved. Many things need to be considered

due to improve the system further in the future. Below are the proposed tasks for the

improvement of the engine.

(i) The engine shall be able to accept more test cases from the various users of Quranic

recitations inputs. Here, the engine must be multi-users, which accept any voice

input from a different people, due to develop a larger evaluation database.

126
(ii) The engine shall be integrated into the hardware part, which allows the users to use

the engine in real system (portable device) rather than using a simulation of the

engine. However, the integration process is believed could be costly and very time

consuming, but it will be very effective and efficient system.

6.5 Conclusion

This research has covered many aspects of speech recognition system and this

research finding will be highly beneficial, due to learn Al-Quran with more interesting

manner, while complying with the established Islamic ways and rules. For recognition

purposes, the recitors recitation scoring was evaluated against the database system for the

transparent evaluation, in which to ensure that the learning experience is optimized. In

addition, this research has successfully achieved their objectives and hopefully, it will give

a lot of benefits to the end users as it is designated for that purposes. However, this

automated Tajweed Checking rules engine for Quranic verse recitation had shown the

strength and weaknesses after the engine has been successfully developed. The

achievements of this engine are very valuable indeed, as it will be references to other

researchers and the developers of such a system in the future. It is very much hopes that the

engine will be implemented in real life and been integrated with the hardware system.

127
REFERENCES

Ahmad, A.M., Ismail, S., Samaon, D.F., 2004, 'Recurrent Neural Network with
Backpropagation through Time for Speech Recognition,' IEEE International
Symposium on Communications & Information Technology, 2004. ISCIT ‘04. Volume 1,
pp. 98 – 102.

Ahmed, M.E., 1991,” Toward an Arabic Text-To-Speech system.” The Arabic Journal
Science and Engine, 1991.

Anwar, M.J., Awais, M.M., Masud, S. & Shamail, S.,” Automatic Arabic Speech
Segmentation System.” Department of Computer Science, Lahore University of
Management Sciences, Lahore, Pakistan.

Bashir, M.S., Rasheed, S.F., Awais, M.M., Masud, S., & Shamail, S., 2003,'Simulation of
Arabic Phoneme Identification through Spectrographic Analysis,' Department of
Computer Science, University of Engineering & Technology, Lahore Pakistan, Lahore
Pakistan.

Bateman, D. Bye, D. and Hunt, M., 1992, 'Spectral Constant Normalization and Other
Techniques for Speech Recognition in Noise,” Proc. IEEE.Inter.Conf. Acoustic.
Speech Signal Process, vol.1, pp. 241-244, 1992.

Chetouani, M., Gas, B., Zarader, J.L. & Chavy, C., 2002, ‘Neural Predictive Coding for
speech Discriminant Feature Extraction: The DFE-NPC’, ESANN’2002 Proceedings
– European Symposium on Artificial Neural Network, Bruges, Belgium, pp. 275-280.

Davis, S.B. & Mermelstein, P., 1980, ‘Comparison of Parametric Representations of


Monosyllabic Word Recognition in Continuously Spoken Sentences’, IEEE
Transactions on Acoustics, Speech and Signal Processing, 28, pp.357-366.

Ehab, M., Ahmad, S. and Mousa, A. 2007,'Speaker Independent Quranic Recognizer Based
on Maximum Likelihood Linear Regression,' Proceedings of World Academy of
Science, Engineering and Technology Volume 20 April 2007.

Essa, O., 1998, ‘Using Prosody in Automatic Segmentation of Speech’, Proceeding 36th
ACM Southeast Regional Conference, pp. 44 - 49, April 1998.

128
Essa, O.,”Using Suprasegmentals in Training Hidden Markov Models for
Arabic."Computer Science Department, University of South Carolina, Columbia.

Felber, P. 2001, 'Speech Recognition: Report of an Isolated Word Experiment', Department


of Electrical & Computer Engineering, Illinois Institute of Technology, Chicago, USA.
Available at: http://www.ece.iit/~pfelber/speechrecognition/ retrieved on 1 September
2008.

Habash, M., 1986, “How to memorize the Quran”, Dar al-Khayr, Beirut 1986.

Hansen, J.C., 2003, ‘Modulation based parameter for Automatic Speech Recognition’,
Master Thesis of Department of Electrical Engineering, University of Rhode Island,
USA.

Hasan, M.R., Jamil, M., Rabbani, M.G. & Rahman, M.S., 2004, ‘Speaker Identification
Using Mel Frequency Cepstral Coefficients’, 3rd International Conference on
Electrical & Computer Engineering ICECE 2004, 28-30 December 2004, Dhaka,
Bangladesh ISBN 984-32-1804-4 565.

Hemantha, G.K., Ravishankar, M., Nagabushan, P. & Basavaraj, S.A., 2006, ‘Hidden
Markov Model based approach for generation of Pitman shorthand language symbols
for consonants and vowels from spoken English’, Sadhana – June 2006. Vol. 31, part
3, pp. 227-290.

Hermansky, H., 1990, ‘Perceptual linear predictive (PLP) analysis of speech’, The Journal
of the Acoustical Society of America -April 1990. Volume 87, Issue 4, pp. 1738-1752.

Hosom, J.P., Cole, R. and Fanty, M. 1999, Speech Recognition Using Neural Networks at
the Center for Spoken Language Understanding, Center for Spoken Language
Understanding (CSLU) Oregon Graduate Institute of Science and Technology, July 6,
1999.

Huang, X., Acero, A., & Hon, H.W., 2001, Spoken Language Processing: A Guide to
Theory, Algorithm and System Development, Prentice Hall, Upper Saddle River, NJ,
USA.

129
Institute for Research in Islamic education (Newspaper), 2007, The New Strait Times
Press-26 September 2007 [Online] Available at: http://www.nst.com.my/ retrieved on
20 November 2007.

J. de Veth and L. Boves, 1998, ‘Channel normalization techniques for automatic speech
recognition over the telephone’. Speech Communication 25 (1998) 149-164.

Jurafsky, D. & Martin, J.H., 2007, Automatic Speech Recognition: Speech and Language
Processing: An Introduction to natural language processing, computational linguistics,
and speech recognition, Prentice Hall, New Jersey, USA.

Khalifa, O., Khan, S., Islam, M.R., Faizal, M. & Dol, D., 2004, ‘Text Independent
Automatic Speaker Recognition’, 3rd International Conference on Electrical &
Computer Engineering, Dhaka, Bangladesh, pp.561-564.

Kirchhoff, K., Bilmes, J., Das, S.,Duta,N., Egan,M. Ji,G. He,F.,Henderson,J., D. Liu, M.
Noamany, P. Schone, R. Schwartz, D. Vergyri, 2003, ‘Novel approaches to Arabic
speech recognition: report from the 2002 Johns-Hopkins Summer Workshop’, IEEE
International Conference on Acoustics, Speech, and Signal Processing, 2003
Proceedings. (ICASSP '03). Volume 1, 6-10 April 2003, pp. I-344 - I-347 vol.1

Kirchhoff, K., Vergyri, D., Bilmes, J., Duh,K. and Stolcke, A. 2004,'Morphology-based
language modeling for conversational Arabic speech recognition,' Eighth
International Conference on Spoken Language ISCA, 2004.

Lee, K.F. & Hon, H.W., 1989, ‘Speaker-Independent Phone Recognition Using Hidden
Markov Models’, IEEE Transactions on Acoustics, Speech and Signal Processing, Vol.
31, pp.1641-1648.

Levent, M.A., 1996, “Foreign Accent Classification in American English.” Dissertation for
Doctor of Philosophy in Department of Electrical & Computer Engineering, Graduate
School of Duke University, Durham, USA.

Linde, Y., Buzo, L. & Gray, R.M., 1980,” An algorithm for Vector Quantizer Design”,
IEEE Transactions on Communications, Vol.COM28,no 1,pp.84-95.

Madisetti, V.K. & Williams, D.B., 1999, Digital Signal Processing Handbook,
CRCnetBASE. CRC Press LLC, USA.

130
Martens, J.P., 2002, 'Continuous Speech Recognition over the Telephone', Electronics &
Information Systems, Ghent University, Belgium. Available at:
http://trappist.elis.ugent.be/ELISgroups/speech/cost249/report/intro.pdf retrieved on 10
September 2008.

Matsui, T., & Furui, S., 1993, 'Comparison of text-independent speaker recognition
methods using VQ-distortion and discrete/continuous HMMs'. Proceedings of the 1993
International Conference on Acoustics, Speech, and Signal Processing (ICASSP),
Institute of Electrical and Electronic Engineers. Minneapolis, Minnesota, pp.157 – 160.

Maamouri, M., Bies, A. and Kulick, S., 2006,'Diacritization to Arabic Treebank Annotation
and Parsing,' Proceedings of the Conference of the Machine Translation SIG, 2006.

Nathan, K., Beigi, H.S.M. and Subrahmonia, J., 1995, ‘On-line Unconstrained Handwriting
Recognition Based On Probabilistic Techniques’.

Nelson & Kristina, 1985, ‘The art of Reciting the Quran’, University of Texas Press, 1985.

Owen, F.J., 1993, ‘Signal Processing of Speech’. Macmillan Press Ltd., London, UK.

Prime Minister's Office of Malaysia 2006, Ninth Malaysia Plan 2006 – 2010, Chapter 11:
Enhancing Human Capital. Available at: http://www.epu.jpm.my/rm9/english/
Chapter11.pdf retrieved on 18 November 2007.

Program j-QAF sentiasa dipantau (Newspaper), 2005, Berita Harian Press-10 May 2005
[Online] Available at: http://www.bharian.com.my/ retrieved on 18 November 2007.

Penutupan Majlis Tilawah al-Quran (Newspaper), 1995, Utusan Malaysia-10 January 1995.
Retrieved on 18 November 2007.

Rabiner, L.R. & Juang, B.H., 1993, ‘Fundamental of Speech Recognition’, Prentice Hall,
New Jersey, USA.

Rabiner, L.R., 1989, ‘A Tutorial on Hidden Markov Model and Selected Applications in
Speech Recognition’, Proceeding of the IEEE, Volume 7, No.2, February 1989.

131
Ramzi A.H., Omar E.A., 2007. ‘CASRA+: A Colloquial Arabic Speech Recognition
Application". American Journal of Applied Sciences 4(1):23-32, 2007 Science
Publication.

Sari, T., Souici, L. and Sellami, M., 2002, ‘Off-Line Handwritten Arabic Character
Segmentation Algorithm: ACSA’, Proc. Int’l Workshop Frontiers in Handwriting
Recognition, pp. 452-457, 2002.

Shen, J., Hung, J. & Lee, L., 1998, ‘Robust Entropy-based Endpoint Detection for Speech
Recognition in Noisy Environments’, 5th International conference ICSLP ’98, Sydney,
Australia, 1998.”

Tabbal, H., El-Falou, W. & Monla, B., 2006, 'Analysis and Implementation of a “Quranic”
verses delimitation system in audio files using speech recognition techniques',In:
Proceeding of the IEEE Conference of 2nd Information and Communication
Technologies, 2006. ICTTA ’06.Volume 2, pp. 2979 – 2984.

Thomas, F.Q., 2002, ‘Discrete Time Speech Signal Processing’, Prentice Hall, New Jersey,
USA.

Ursin, M., 2002, ‘Triphone Clustering in Finnish Continuous Speech Recognition’, Master
Thesis, Department of Computer Science, Helsinki University of Technology, Finland.

Vergyri,D.,Kirchhoff, K. 2004, 'Automatic Diacritization of Arabic for Acoustic Modeling


in Speech Recognition,' COLING Workshop on Arabic-script Based Languages,
Geneva, 2004.

Viterbi, A.J.,1967, ‘Error bounds for convolutional codes and an asymptotically optimum
decoding algorithm,’ IEEE Trans. Information Theory, vol. IT-13, pp. 260-269, Apr.
1967.

Vuuren, S.V., 1996,’Comparison of Text-Independent Speaker Recognition Methods on


Telephone Speech with Acoustic Mismatch’, Proceeding (ICSLP)96, Vol:3,
Philadelphia, PA. pp. 1788-1791.

Yongwon, J. & Hyung, S.K., 2001,’Recognition Confidence Scoring using Recognition


Results from Perturbed Input Feature Vectors’, Electronics Letter, Volume: 37, Issue:
18, pp. 1143 – 1145.

132
Youssef, A. & Emam, O., 2004, ‘An Arabic TTS based on the IBM Trainable Speech
Sythesizer’, Department of Electronics & Communication Engineering, Cairo
University, Giza, Egypt.

Wai, C.C., 2003, Speech Coding Algorithm foundations and evolution of standardized
Coders, John Wiley & Sons Inc.,NJ, USA.

133
APPENDIX A

Signal of 8 ayates of Sourate Al-Fatihah

1) Result from 'Bismillah' utterance

Speech Sample of Quranic Recitation


1

0.5
Amplitude

-0.5

-1
0 0.5 1 1.5 2 2.5 3 3.5
Time [sec]
Frequency [kHz]

6 3
2
4
1
2
0
0
0 0.5 1 1.5 2 2.5 3 3.5
Time [sec]

134
2) Result from 'fatihah1' utterance

Speech Sample of Quranic Recitation


1

0.5
Amplitude

-0.5

-1
0 0.5 1 1.5 2 2.5 3 3.5
Time [sec]
Frequency [kHz]

0
0 0.5 1 1.5 2 2.5 3 3.5
Time [sec]

3) Result from 'fatihah2' utterance

Speech Sample of Quranic Recitation


1

0.5
Amplitude

-0.5

-1
0 0.5 1 1.5 2 2.5 3 3.5
Time [sec]
Frequency [kHz]

0
0 0.5 1 1.5 2 2.5 3 3.5
Time [sec]

135
4) Result from 'fatihah3' utterance

Speech Sample of Quranic Recitation


1

0.5
Amplitude

-0.5

-1
0 0.5 1 1.5 2 2.5 3 3.5
Time [sec]
Frequency [kHz]

0
0 0.5 1 1.5 2 2.5 3 3.5
Time [sec]

5) Result from 'fatihah4' utterance

Speech Sample of Quranic Recitation


1

0.5
Amplitude

-0.5

-1
0 0.5 1 1.5 2 2.5 3 3.5
Time [sec]
Frequency [kHz]

0
0 0.5 1 1.5 2 2.5 3 3.5
Time [sec]

136
6) Result from ‘fatihah5’ utterance

Speech Sample of Quranic Recitation


1

0.5
Amplitude

-0.5

-1
0 0.5 1 1.5 2 2.5 3 3.5
Time [sec]
Frequency [kHz]

0
0 0.5 1 1.5 2 2.5 3 3.5
Time [sec]

7) Result from ‘fatihah6’ utterance

Speech Sample of Quranic Recitation


1

0.5
Amplitude

-0.5

-1
0 0.5 1 1.5 2 2.5 3 3.5
Time [sec]
Frequency [kHz]

0
0 0.5 1 1.5 2 2.5 3 3.5
Time [sec]

137
8) Result from ‘fatihah7’ utterance

Speech Sample of Quranic Recitation


1

0.5
Amplitude

-0.5

-1
0 0.5 1 1.5 2 2.5 3 3.5
Time [sec]
Frequency [kHz]

0
0 0.5 1 1.5 2 2.5 3 3.5
Time [sec]

138
APPENDIX B

List of Published Papers and Achievements

Journal
Zaidi Razak, Noor Jamaliah Ibrahim, Mohd Yamani Idna Idris, Emran Mohd Tamil, Mohd
Yakub @ Zulkifli Mohd Yusoff, and Noor Naemah Abdul Rahman, 2008 "Quranic Verse
Recitation Recognition Module for Support in j-QAF Learning: A Review" IJCSNS
International Journal of Computer Science and Network Security, VOL.8 No.8, August
2008,(In Press). pp. 207-2016, Journal ISSN: 1738-7906.

Proceeding

1. Noor Jamaliah Ibrahim, Zaidi Razak, Mohd Yakub @ Zulkifli Mohd Yusoff, Mohd
Yamani Idna Idris, Emran Mohd Tamil, "Quranic verse Recitation feature
extraction using Mel-Frequency Cepstral Coefficients (MFCC)", In Proceedings of
the 4th IEEE International Colloquium on Signal Processing and its Application
(CSPA) 2008, 7-9 March 2008, Kuala Lumpur, MALAYSIA.

2. Noor Jamaliah Ibrahim, Mohd.Yakub@Zulkifli Mohd Yusoff & Zaidi Razak, 2008
"Quranic verse Recitation Recognition Module for Educational Programme",
International Seminar on Research in Islamic Studies 2008 @ ISRIS '08, 17-18
December 2008, Kuala Lumpur, MALAYSIA.

Awards

Gold Medal - Mohd Yakub @ Zulkifli Bin Haji Mohd Yusoff, Zaidi Razak, Noor Jamaliah
Binti Ibrahim, Mohd Yamani Idna Idris, Emran Mohd Tamil & Noorzaily Mohamed Noor,
“Effective Learning of Quranic Verse Recitation Using Automated Tajweed Checking
Rules Educational Tools”, 20th International Invention, Innovation and Technology
Exhibition ITEX 2009, Kuala Lumpur, Malaysia, 15-17 May 2009.

139

Das könnte Ihnen auch gefallen