Sie sind auf Seite 1von 2

CN 711 Speech Recognition Course Instructor: Dr. M. Sabarimalai Manikandan E-mail: msm.sabari@gmail.

CN 711: Speech Recognition Course Topics
Course Objectives:
This course provides an introduction to the field of digital speech processing and applications. Speech Processing offers a practical and theoretical understanding of how human speech can be processed by computers. It covers speech analysis and synthesis, speech features, speech and speaker recognition, speech synthesis and applications. The course involves practical where the student will build working text-to-speech system in his native language, speech recognition systems, build their own synthetic voice and build a complete telephone spoken dialog system. A. Review some basic DSP concepts C. E. Speech Analysis and Synthesis Short-time Fourier Analysis, Spectrogram Autocorrelation and cross-correlation Human speech production model Temporal and spectral characteristics Linear prediction (LP) filter theory All-pole Filter, Inverse Filtering Formants and Pitch Determination LP Residuals and Hilbert Transform Vocal tract length normalization Enhancement nhancement, Speech Enhancement, Coding and Quality Assessment Acoustic echo cancellation Reverberant speech enhancement Removal of Different Types of noise and artifacts Speech Coding Subjective and Objective Metrics D. F. Speech Features for Recognition Temporal and Short-Time Fourier Transform Features Teager Energy Based Features, Entropy Cepstral Coefficients Linear Prediction-based Cepstral coefficients (LPCC) Mel Frequency Cepstral Coefficients (MFCCs) AM-FM Features, Time-Frequency Analysis Wavelet Octave Coefficients of Residues (WOCR) Voice Activity Detection Silence, Voiced, and Unvoiced Speech Classification B. Introduction to Speech Signals Speech production mechanism Types of Sounds, Vowels and consonants Loudness, Sound Pressure Nature of speech signal, models of speech production Silence, Voiced and Unvoiced Speech Naturalness and Intelligibility Speech data acquisition system Why speech processing Speech perception model


Speech Recognition Signal Processing, Template matching Phoneme-Recognition HMMs, Acoustic Modeling, Language Modeling Continuous and Emotional Speech Recognition Performance Evaluation

Recognition Speaker Recognition Basic ASR System Close-set and Open-set ASR System Speaker Identification and Verification Text-Independent and Text-Dependent Recognition Mean Normalization, Feature Smoothing Dynamic Time Warping (DTW), Vector Quantization Gaussian Mixture Models (GMMs) and Universal Background Model (UBM) Log-Likelihood Ratio (LLR) False Acceptance Probability, False Rejection probability Detection Error Trade-off (DET) curve Equal Error Rate (EER) H. Speech Preprocessing Applications Voice Conversion, Text-Speech Synthesis Spoken Dialogue System, Interactive Voice Response (IVR) System Identify Your ID

Textbooks and Materials

[1]. [2]. [3]. [4]. [5]. [6]. [7]. [8]. [9]. [10]. [11]. Li Tan, Digital Signal Processing: Fundamentals and Applications, Elsevier, 2008. Jayant, N.S.; Noll, P. Digital coding of waveforms: principles and applications to speech and video. Englewood Cliffs, NJ: Prentice Hall, 1984. ISBN 0132119137. Rabiner, L.R.; Juang, B. Fundamentals of speech recognition. Englewood Cliffs: Prentice Hall, 1993. ISBN 0130151572. L.R. Rabiner and R.E Schafer : Digital processing of speech signals, Prentice Hall, 1978. J.L Flanagan : Speech Analysis Synthesis and Perception - 2nd Edition - Sprenger Vertag, 1972. Jelinek. Statistical Methods for Speech Recognition. MIT Press, 1997. Jurafsky & Martin. Speech and Language Processing: An Introduction to NLP, CL, and Speech Recognition, Prentice Hall, 2000. T.F. Quatieri, Discrete-Time Speech Signal Processing: Principles and Practice, Prentice-Hall, 2001. J. R. Deller, J. H. L. Hansen, and J. G. Proakis, Discrete-Time Processing of Speech Signals, 2nd edition, IEEE Press, 2000. T. W. Parsons, Voice and Speech Processing, McGraw-Hill, 1987. X. Huang, A. Acero, H. Hon, and R. Reddy, Spoken Language Processing: A Guide to Theory, Algorithm and System Development, Prentice-Hall, 2001.

[12]. Instructor's Notes

Languages: Programming Languages: MATLAB and Jave Media Framework

Important Standard Journals in the Field of Audio and Speech Processing IEEE Transactions on Audio, Speech and Language Processing IEEE Transactions on Signal Processing IEEE Signal Processing Magazine IEEE Transactions on Information Forensics and Security ACM Transactions on Speech and Language Processing IEEE Multimedia Speech Communication (by Elsevier) IEEE Signal Processing Letters Signal Processing (by Elsevier) Digital Signal Processing (by Elsevier) International Journal of Speech Technology International Journal of Speech Technology (by Springer) Signal, Image and Video Processing (by Springer) Computer Speech and Language EURASIP Journal on Audio, Speech, and Music Processing wi) Journal of Acoustical Society of America (JASA ) Audio Engineering Society

Important Conferences in the Field of Audio and Speech Processing IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP) Eurospeech Int. Conf. on Spoken Language Processing (ICSLP) Acoustical Society of America