CN 711 Speech Recognition Course Instructor: Dr. M. Sabarimalai Manikandan E-mail: msm.sabari@gmail.

CN 711: Speech Recognition Course Topics
Course Objectives:
This course provides an introduction to the field of digital speech processing and applications. Speech Processing offers a practical and theoretical understanding of how human speech can be processed by computers. It covers speech analysis and synthesis, speech features, speech and speaker recognition, speech synthesis and applications. The course involves practical where the student will build working text-to-speech system in his native language, speech recognition systems, build their own synthetic voice and build a complete telephone spoken dialog system. A. Review some basic DSP concepts C. E. Speech Analysis and Synthesis Short-time Fourier Analysis, Spectrogram Autocorrelation and cross-correlation Human speech production model Temporal and spectral characteristics Linear prediction (LP) filter theory All-pole Filter, Inverse Filtering Formants and Pitch Determination LP Residuals and Hilbert Transform Vocal tract length normalization Enhancement nhancement, Speech Enhancement, Coding and Quality Assessment Acoustic echo cancellation Reverberant speech enhancement Removal of Different Types of noise and artifacts Speech Coding Subjective and Objective Metrics D. F. Speech Features for Recognition Temporal and Short-Time Fourier Transform Features Teager Energy Based Features, Entropy Cepstral Coefficients Linear Prediction-based Cepstral coefficients (LPCC) Mel Frequency Cepstral Coefficients (MFCCs) AM-FM Features, Time-Frequency Analysis Wavelet Octave Coefficients of Residues (WOCR) Voice Activity Detection Silence, Voiced, and Unvoiced Speech Classification B. Introduction to Speech Signals Speech production mechanism Types of Sounds, Vowels and consonants Loudness, Sound Pressure Nature of speech signal, models of speech production Silence, Voiced and Unvoiced Speech Naturalness and Intelligibility Speech data acquisition system Why speech processing Speech perception model


Speech Recognition Signal Processing, Template matching Phoneme-Recognition HMMs, Acoustic Modeling, Language Modeling Continuous and Emotional Speech Recognition Performance Evaluation

Recognition Speaker Recognition Basic ASR System Close-set and Open-set ASR System Speaker Identification and Verification Text-Independent and Text-Dependent Recognition Mean Normalization, Feature Smoothing Dynamic Time Warping (DTW), Vector Quantization Gaussian Mixture Models (GMMs) and Universal Background Model (UBM) Log-Likelihood Ratio (LLR) False Acceptance Probability, False Rejection probability Detection Error Trade-off (DET) curve Equal Error Rate (EER) H. Speech Preprocessing Applications Voice Conversion, Text-Speech Synthesis Spoken Dialogue System, Interactive Voice Response (IVR) System Identify Your ID

Textbooks and Materials

[12]. Instructor's Notes

Languages: Programming Languages: MATLAB and Jave Media Framework

