Sie sind auf Seite 1von 12

CS 582

Intro to Speech Processing

Chuck Konopka CKonopka@mail.sdsu.edu


M W 2:00-3:15pm EBA 439
Lecture I Administration/Organization, An Introduction to the Topic
Wed., 1.21.15

Grading Criteria

3 homework assignments: 15%


1 take home midterm exam: 20%
1 take home final exam: 30%
1 semester team project:
35%
Extra Credit Opportunities: Up to 10%

Major Topics
Modeling
Acoustic Theory of Speech Production and Perception

(How we model the speech and hearing processes)

Acoustic-Phonetics

(How we model acoustic units of speech)

Time-Frequency Analysis

(Techniques for converting and analyzing time-domain data in other domains)

Speech pre-processing

(An implementation of time-frequency analysis for speech data)

Supervised Learning

(How computers learn using examples)

Unsupervised Learning

(How computers can learn from data independently)

Speech Structure

Rule-based Grammar
Statistical Grammar

The Meaning in Speech

Semantic Nets, etc.

Syllabus
(Weeks 1-8)

Week

Subject

Course Introduction

The Really Big Picture: What is Modeling?

Demonstration/Lab: (Using the CSLU Toolkit to build a simple working speech


recognition system)

Selection of Semester Project (1-2 pages)

2-3

The Big Picture

The physical model of speech recognition: Speech production and perception

Deriving a computational model of speech recognition from the physical model

4-6

Machine Learning

Supervised Learning

Unsupervised Learning

Machine Learning Lab (An application of Matlab or Java-based software to a learning


problem)

Midterm review and exam

Take-home midterm exam

Midterm report on Semester Project progress (1-2 pages)

Syllabus
(Weeks 9-16)

Week

Subject

8-10

Hidden Markov Models

The famous 3 lectures

HMM Lab: Simple implementations of key HMM algorithms

11-14

Speech pre-processing

An introduction to Time-frequency analysis

The Fourier Transform (FT) and the Fast Fourier Transform (FFT)

The Wavelet transform (WT) and the Wavelet Packet Transform (WPT)

The Cepstral Transform (CT)

Mel-frequency Cepstral Coefficient Analysis (MFCC)

Signal Processing Lab: Implementation of a signal processing algorithm

14-15

Language Modeling

Rule-based grammar: CFG (Context Free Grammar)

Stochastic grammar: N-Gram models, Probabilistic Context Free Grammar

Semantics

16

Finals Week

Take-home final

Semester Project due at weeks end

Speech Processing Applications


Examples of speech processing applications include:
Speech recognition

Speech synthesis

CSRLU Toolkit
AT&T Natural Voices

Speech effects

pitch bending
Chorus effects

Grammar modeling

Dragon Dictate
SAPI (Microsofts Speech API)
CSRLU Toolkit

Synthetic Shakespeare

Speaker recognition
Acoustic Biometrics
Accent recognition
Language training

Resources
(Things you will need)

Textbook:
Speech And Language Processing, 2nd Edition,
Jurafsky & Martin Prentice Hall, 2009
Matlab/Octave, Audacity, Java, C++, etc.
Various papers to be announced

The Semester Project

The goal of the Semester Project is to apply and generalize the presented concepts by developing a
Big Idea in a team setting.

Big Ideas Some examples of Prior Semester Projects:

Synthetic Shakespeare
The Cocktail Party Effect
Concatenative Speech Synthesis
Prosody Detection & Synthesis
Accent Recognition
Harmony Generation
Emotion Recognition
Synthetic Beatles, Beethoven, etc.

Along the way, youll:

Develop the Big Idea into something you can implement


Develop research and writing skills
Develop team building and coordination skills

Examples of Past Semester Projects

Synthetic Shakespeare
The Cocktail Party Effect
Concatenative Speech Synthesis
Prosody Detection & Synthesis
Accent Recognition
Harmony Generation
Emotion Recognition
Synthetic Beatles, Beethoven, etc.

In Brief

This course will introduce you to the fundamentals of


speech processing and how these concepts can
be applied to other problem domains.

The Big Idea


A perfect understanding of how we understand speech
isnt required to build a system that can recognize
and produce speech.
It is possible to use speech data itself to build a
system that can recognize and produce speech.
The Big Idea is that it is possible to create a
solution using the problem data itself.

A Quick Overview
Natures Model:

Well begin with a definition of a model. Well then take a


look at the biological models of speech production and
perception that serve as the basis for the computational
models of speech.
A Computational Model:

Once we understand the Natural Model, well proceed to


develop a computational model.
How:

Well develop a hierarchy of the building blocks of speech


and then build a system using these components.

These elements are:

The acoustic (audio) elements


The phonetic elements
The structure of speech Grammar,
The meaning in speech (i.e. semantics)

Das könnte Ihnen auch gefallen