Sie sind auf Seite 1von 4

UNIVERSITY MALAYSIA PERLIS SCHOOL OF MECHATRONIC ENGINEERING FINAL YEAR PROJECT PROPOSAL

Development of an Automatic Subtitling System


Dr. Shafriza

YIAP CHIEN WERN

081061198

Synopsis The ability to input commands and words without physical input such as typing or writing has great effect on people who are physically handicapped; the technology can be applied in application that requires instant response or multiple actions in a short time such as air traffic control, aircraft cockpit control and medical domain. Speech signal not only conveys the linguistic message but also contains a lot of information such as age, gender, social background and even regional origin [1]. For this project we won't go into such wide scope, data acquisition will be done in a controlled condition so only one individual will be used to obtain the data, and from that we compare the process voice signal to our own library. Generally, for speech recognition there are three usual methods: Dynamic Time Warping (DTW), Hidden Markov Model (HMM) and Artificial Neural Networks (ANNs) [2]. Dynamic time warping (DTW) is a technique that finds the optimal alignment between two time series if one time series may be warped non linearly by stretching or shrinking it along its time axis. This warping between two time series can then be used to find corresponding regions between the two time series or to determine the similarity between the two time series [3]. Hidden Markov Model (HMM) has two strong reasons why this method is widely used. First, the models are very rich in mathematical structure and hence can form the theoretical basis for use in a wide range of applications. Second, the models, when applied properly, work very well in practice for several important applications [4]. Nowadays, ANNs are utilized in wide ranges for their parallel distributed processing, distributed memories, error stability, and pattern learning distinguishing ability [5]. ANNs are faster, because output is resulted from multiplication of adjusted weights in present input. At present TDNN (Time-Delay Neural Network) is widely used in speech recognition [6]. Therefore, some studies regarding the complexity of the method are required to determine which method is the best for this project in terms of pros and cons.

Objective The objective for this project is to develop an algorithm for an automated subtitling system. Scope For the project we are required to develop an algorithm for an automated subtitling process which is based on HMM (Hidden Markov Model) in a controlled environment. Furthermore, this project consists of various parts and task: Performing research and literature review regarding the techniques for speech recognition Determining the best method for voice recognition system Perform the subtitling process based on developed algorithm for speech recognition

Methodology

1. 2. 3. 4. 5. 6.

Audio recording and Utterance detection Pre-Filtering (pre-emphasis, normalization, banding, etc.) Framing and Windowing (chopping the data into a usable format) Filtering (further filtering of each window/frame/freq. band) Comparison and Matching (recognizing the utterance) Action (Perform function associated with the recognized pattern) [7]

Reference [1] M. Benzeghiba, R. De Mori, O. Deroo, S. Dupont *, T. Erbes, D. Jouvet, L. Fissore, P. Laface, A. Mertins, C. Ris, R. Rose, V. Tyagi, C. Wellekens "Automatic speech recognition and speech variability: A review" Multitel, Parc Initialis, Avenue Copernic, B-7000 Mons, Belgium Karina Vieira, Bogdan Wilamowski, and Robert Kubichek " Speaker Verification for Security Systems Using Artificial Neural Networks". IEEE trans. pp.1102-1105,2003 Keogh, E. & M. Pazzani. "Derivative Dynamic Time Warping". In Proc.of the First Intl. SIAM Intl. Conf. on Data Mining, Chicago, Illinois,2001. LAWRENCE R. RABINER "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition" Fellow, IEEE Bahlmann. Haasdonk. Burkhardt. "speech and audio recognition" .IEEE trans. Vol 11. May 2003. Edward Gatt, Joseph Micallef, Paul Micsllef, Edward Chilton. "Phoneme Classification in Hardware Implemented Neural Networks ". IEEE trans, pp.481, 2001. http://tldp.org/HOWTO/Speech-Recognition-HOWTO/inside.html

[2]

[3]

[4]

[5] [6]

[7]

Expected Budget This project's expected budget should be around rm50, as the only required hardware is a microphone that is used to record sound wave.

Activities Activity Research on speech recognition Voice recording Signal analysis Library compilation Build algorithm Testing and debugging Real time operation Estimated duration 2 1 2 2 3 2 1

Gantt chart no 1 2 3 4 5 6 7 activities Research on speech recognition Voice recording Signal analysis Library compilation Build algorithm Testing and debugging Real time operation duration 1 2 1 2 2 3 2 1 2 3 4 months 5 6

Expected Result At the end of this project, a suitable process and progress for obtaining an algorithm for speech recognition will be achieved, by either using HMM or neural network. This program will use MATLAB as the foundation, and most signal analysis can be process and test using MATLAB. In the end, the program will be able to process audio signal and match it to the data library thus obtaining the correct phrase or words from the audio data.

Das könnte Ihnen auch gefallen