Willkommen bei Scribd!

Speech Processing 15-492/18-492: Speech Recognition Template Matching

Hochgeladen von

0% fanden dieses Dokument nützlich (0 Abstimmungen)

47 Ansichten24 Seiten

Template matching is a simple speech recognition technique that compares an input audio sample against stored templates. Dynamic time warping (DTW) allows the templates and samples to be of different durations by warping them for alignment. DTW works well for small vocabularies (<20 words) but larger vocabularies require extending the template model, such as stringing phoneme templates together. Reliability can be improved by averaging over multiple template examples and using distance metrics like Mahalanobis that account for variance.

Originalbeschreibung:

Originaltitel

04 Asr Template

Copyright

Verfügbare Formate

PDF, TXT oder online auf Scribd lesen

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Dieses Dokument melden

Copyright:

Attribution Non-Commercial (BY-NC)

Verfügbare Formate

Als PDF, TXT herunterladen oder online auf Scribd lesen

Markieren Sie unangemessene Inhalte

0% fanden dieses Dokument nützlich (0 Abstimmungen)

47 Ansichten24 Seiten

Speech Processing 15-492/18-492: Speech Recognition Template Matching

Hochgeladen von

Shobhit Pradhan

Copyright:

Attribution Non-Commercial (BY-NC)

Verfügbare Formate

Als PDF, TXT herunterladen oder online auf Scribd lesen

Markieren Sie unangemessene Inhalte

Zu Seite

Sie sind auf Seite 1von 24

Im Dokument suchen

Speech Processing 15-492/18-492

Speech Recognition Template matching

Speech Recognition by Templates

A little history Matching Templates DTW (Dynamic Time Warping) Beyond template matching

Radio Rex (1922)

Toys always lead technology Call Rex and he comes out of his kennel

(Crystalradio.com and Rhys Jones)

Toy ASRTricks
Radio Rex
Recognizes vowel formants in EH

Voice activated toy train

Multilingual stop/go hashire/tomate

Toys pets dont need perfect ASR

Template Matching
Record templates from user
Store in library

Record ASR example

Compare against each library template

Select closest example For example

On a voice dialing system

Voice Dialing System

Library
Mom Dad Bob Marios Pizza

Lets Go Bus Information System

Matching in Time Domain

Duration
Will discriminate some examples But Mom, Bob and Dad will be confused

What about spectral properties

Matching in Frequency Domain

Mom

Bob

Different deliveries
We change durations
Two utterances are never the same

When it fails we change our delivery

Become more articular clearer

Dynamic Time Warping

Template

Sample Speech

DTW algorithm
Template
i i-1 j-1 j

Sample
For each square Dist(template[i],sample[j]) + smallest_of (Dist(template[i-1],sample[j]) Dist(template[i],sample[j-1]) Dist(template[i-1],sample[j-1]) Remember which choice your took (count path)

Multiple Templates
Compare against each Find closest Need to normalize scores
(divide by length of matches)

Matching Templates
Template Library Sample Word0 Word1 Word2

For Word in Templates Score = dtw(Template[Word], Sample); if (Score < BestScore) BestWord = Word; DoAction(Action[BestWord])

DTW issues
What happens with no-matches
Need to deal with none of the above

What happens with more templates

Harder to choose between Once variance greater than differences

Choose templates that are very different

DTW/Template Applications
Voice dialer Simple command and control Speaker ID

Speaker ID
Template Library Sample Speaker0 Speaker1 Speaker2

For Speaker in Templates Score = dtw(Template[Speaker], Sample); if (Score < BestScore) BestSpeaker = Speaker;

DTW
Advantages
Works well for small number of templates (<20) Language independent Speaker specific Easy to train (end user controls it)

Disadvantages
Limited number of templates Speaker specific Need actual training examples

More reliable matching

Distance metric
Euclidean

But some distances are bigger than others

Silence is pretty similar Fricatives are quite larger
A longer fricative might give large score A longer vowel might give smaller score

More reliable matching

Having multiple template examples
Individual matches or Average them together

DTW align all of the examples Collect statistics as a Gaussian

Mean and standard deviation for each coeff

More reliable distances

Instead of Euclidean distance
Doesnt care about the standard deviation

Use Mahalanobis distance

Care about means and standard deviation

Extending Template matching

String word templates together
Need to find word segmentation Word0 Word1 Word2

But there are many words

Extending template model

String phoneme templates together
A template model for each phoneme Sample k ae t Phoneme Templates Phone0 Phone1 Phone2

Summary
Speech Recognition by Templates
Good for simple small vocabulary tasks

Dynamic Time Warping (DTW)

Can match different durational examples

Averaging over multiple models Distance metrics

Euclidean vs Mahalanobis

Das könnte Ihnen auch gefallen

Visual Word: Unlocking the Power of Image Understanding
Von Everand
Visual Word: Unlocking the Power of Image Understanding
Fouad Sabry
Noch keine Bewertungen
Automatic Speech Recognition
Dokument35 Seiten
Automatic Speech Recognition
Bhem Kumar
Noch keine Bewertungen
Introduction to SystemVerilog
Von Everand
Introduction to SystemVerilog
Ashok B. Mehta
Noch keine Bewertungen
Automatic Speech Recognition
Dokument34 Seiten
Automatic Speech Recognition
Rishabh Purwar
Noch keine Bewertungen
Write: Get Unlimited Access To The Best of Medium For Less Than $1/week
Dokument19 Seiten
Write: Get Unlimited Access To The Best of Medium For Less Than $1/week
khushal
Noch keine Bewertungen
Speech Recognition1
Dokument39 Seiten
Speech Recognition1
chayan_m_shah
100% (1)
Large Language Models: Dr. Asgari, Dr. Rohban, Soleymani Fall 2023
Dokument53 Seiten
Large Language Models: Dr. Asgari, Dr. Rohban, Soleymani Fall 2023
caxoxi6154
Noch keine Bewertungen
Automatic Speech Recognition
Dokument30 Seiten
Automatic Speech Recognition
anand_sesham
Noch keine Bewertungen
Text Preprocessing
Dokument59 Seiten
Text Preprocessing
saisuraj1510
Noch keine Bewertungen
Information Retrieval: Text Processing
Dokument43 Seiten
Information Retrieval: Text Processing
firas
Noch keine Bewertungen
Presentation On Speech Recognition
Dokument11 Seiten
Presentation On Speech Recognition
aditya_4_sharma
Noch keine Bewertungen
Lecture 3-Term Vocabulary and Posting Lists
Dokument26 Seiten
Lecture 3-Term Vocabulary and Posting Lists
Yash Gupta
Noch keine Bewertungen
GGDSD College: Speech Recognition
Dokument33 Seiten
GGDSD College: Speech Recognition
Deep Saini
Noch keine Bewertungen
Design and Implementation
Dokument74 Seiten
Design and Implementation
Em
Noch keine Bewertungen
A Brief Introduction To Automatic Speech Recognition
Dokument22 Seiten
A Brief Introduction To Automatic Speech Recognition
Pham Thanh Phu
Noch keine Bewertungen
AI6122 Topic 1.2 - WordLevel
Dokument63 Seiten
AI6122 Topic 1.2 - WordLevel
Yujia Tian
Noch keine Bewertungen
05 Introduction To NLP
Dokument63 Seiten
05 Introduction To NLP
Manish kumawat
Noch keine Bewertungen
Chapter Two: Text Operations
Dokument41 Seiten
Chapter Two: Text Operations
endris yimer
Noch keine Bewertungen
Automatic Speech Recognition (ASR) : Omar Khalil Gómez - Università Di Pisa
Dokument65 Seiten
Automatic Speech Recognition (ASR) : Omar Khalil Gómez - Università Di Pisa
Danut Simionescu
100% (1)
Eusipco 2015 7362666
Dokument5 Seiten
Eusipco 2015 7362666
bluenemo
Noch keine Bewertungen
Introduction To Computers and Programming
Dokument41 Seiten
Introduction To Computers and Programming
Rajanikanth Bougolla B
Noch keine Bewertungen
Audio Fingerprinting Sls 24oct2011
Dokument44 Seiten
Audio Fingerprinting Sls 24oct2011
Ravi Teja
Noch keine Bewertungen
Lec 19
Dokument60 Seiten
Lec 19
hancocker
Noch keine Bewertungen
CS 224S/LING 281 Speech Recognition, Synthesis, and Dialogue
Dokument78 Seiten
CS 224S/LING 281 Speech Recognition, Synthesis, and Dialogue
Bolo BT
Noch keine Bewertungen
Speech Recognition Seminar
Dokument19 Seiten
Speech Recognition Seminar
going12345
Noch keine Bewertungen
Viva Speech
Dokument4 Seiten
Viva Speech
Manab Chetia
Noch keine Bewertungen
Chat Bots: Welcome To The World of Living People and Artificial Intelligence Entities Called Bots!
Dokument27 Seiten
Chat Bots: Welcome To The World of Living People and Artificial Intelligence Entities Called Bots!
HazlinaShariff
Noch keine Bewertungen
Text To Speech
Dokument33 Seiten
Text To Speech
shashankif
Noch keine Bewertungen
Institute - Uie Department-Academic UNIT-1: Bachelor of Engineering (Computer Science & Engineering)
Dokument27 Seiten
Institute - Uie Department-Academic UNIT-1: Bachelor of Engineering (Computer Science & Engineering)
Something new in mobile
Noch keine Bewertungen
Modern Recording Techniques - 6th Ed. Wordclock Pp.258-259
Dokument2 Seiten
Modern Recording Techniques - 6th Ed. Wordclock Pp.258-259
Susana Brochado
Noch keine Bewertungen
CSE442 Text
Dokument89 Seiten
CSE442 Text
sanskritiiiii.2002
Noch keine Bewertungen
A Framework For Deepfake V2
Dokument24 Seiten
A Framework For Deepfake V2
Abdullah fawaz altulahi
Noch keine Bewertungen
How It Works: Audio Quality Measurements in The Digital Age
Dokument5 Seiten
How It Works: Audio Quality Measurements in The Digital Age
Lo Shao-Ting
Noch keine Bewertungen
Hands On Lab Customization
Dokument29 Seiten
Hands On Lab Customization
Bell Bou
Noch keine Bewertungen
Chapter Two
Dokument31 Seiten
Chapter Two
latigudata
Noch keine Bewertungen
Home Automation (Speech Recognition) : Algorithm
Dokument1 Seite
Home Automation (Speech Recognition) : Algorithm
Saleh_Saeed_7344
Noch keine Bewertungen
Computer Based Automatic Speech Processing: Pham Van Tuan
Dokument70 Seiten
Computer Based Automatic Speech Processing: Pham Van Tuan
hondaitodung
Noch keine Bewertungen
Adaptive Multi Rate Coder Using ACLP
Dokument45 Seiten
Adaptive Multi Rate Coder Using ACLP
shah_mayana
Noch keine Bewertungen
Whitepaper How AI Speech Models Work
Dokument18 Seiten
Whitepaper How AI Speech Models Work
Vmp
Noch keine Bewertungen
Interactive Telecommunication Systems and Services: Ing. Stanislav Ondáš, PHD
Dokument35 Seiten
Interactive Telecommunication Systems and Services: Ing. Stanislav Ondáš, PHD
Darkeye18
Noch keine Bewertungen
Speech Processing: Lecture 9 - Large Vocabulary Continuous Speech Recognition (LVCSR)
Dokument32 Seiten
Speech Processing: Lecture 9 - Large Vocabulary Continuous Speech Recognition (LVCSR)
trlp1712
Noch keine Bewertungen
Naturalspeech 2: Latent Diffusion Models Are Natural and Zero-Shot Speech and Singing Synthesizers
Dokument19 Seiten
Naturalspeech 2: Latent Diffusion Models Are Natural and Zero-Shot Speech and Singing Synthesizers
Jurusan Teknik Elektro Polnam
Noch keine Bewertungen
Rutorrent Rss Pcre Tutorial.: Understanding Naming Schemes / Tags
Dokument4 Seiten
Rutorrent Rss Pcre Tutorial.: Understanding Naming Schemes / Tags
cacacacaca2600
Noch keine Bewertungen
Paralinguistic Information Processing: - Prosody
Dokument41 Seiten
Paralinguistic Information Processing: - Prosody
Codrin Enea
Noch keine Bewertungen
Automatic Subtitle Generator
Dokument25 Seiten
Automatic Subtitle Generator
ravi060791
0% (1)
Text Analytics
Dokument32 Seiten
Text Analytics
Mahesh Ramalingam
Noch keine Bewertungen
Lecture 4
Dokument48 Seiten
Lecture 4
mohsin
Noch keine Bewertungen
Session 11-12 - Text Analytics
Dokument38 Seiten
Session 11-12 - Text Analytics
Shishir Gupta
Noch keine Bewertungen
Lec 5
Dokument18 Seiten
Lec 5
ريم صلاح علي
Noch keine Bewertungen
Tech Tips Diff Inst
Dokument108 Seiten
Tech Tips Diff Inst
api-250418213
Noch keine Bewertungen
NLP 101 - Machine Learning Seminar 2017
Dokument30 Seiten
NLP 101 - Machine Learning Seminar 2017
Dan
100% (1)
Automatic Speech Recognition
Dokument17 Seiten
Automatic Speech Recognition
Priyanka Solanki
Noch keine Bewertungen
M 1.4 Introduction To Speech Recognition & Synthesis 2
Dokument26 Seiten
M 1.4 Introduction To Speech Recognition & Synthesis 2
Nikhil
Noch keine Bewertungen
Advances in Speech To Speech Translation Technologies
Dokument35 Seiten
Advances in Speech To Speech Translation Technologies
hanoomlee
Noch keine Bewertungen
The Birthday Attack: Discussion
Dokument8 Seiten
The Birthday Attack: Discussion
Dr. Priyanka Mathur
Noch keine Bewertungen
Machine Translation: A Presentation By: Julie Conlonova, Rob Chase, and Eric Pomerleau
Dokument31 Seiten
Machine Translation: A Presentation By: Julie Conlonova, Rob Chase, and Eric Pomerleau
emailmyname
Noch keine Bewertungen
Introduction To Language Modeling Final
Dokument69 Seiten
Introduction To Language Modeling Final
jeysam
Noch keine Bewertungen
Chaitanya Asawa
Dokument8 Seiten
Chaitanya Asawa
Mohammadreza Rostam
Noch keine Bewertungen
Digital SAT
Dokument60 Seiten
Digital SAT
Rakan
Noch keine Bewertungen
NLP For Social Media Lecture 5: Direct Processing of Social Media Data
Dokument35 Seiten
NLP For Social Media Lecture 5: Direct Processing of Social Media Data
RaziAhmed
Noch keine Bewertungen
LabView Tutorial
Dokument288 Seiten
LabView Tutorial
otavarez
Noch keine Bewertungen
Hypre-2.0.0 Usr Manual
Dokument83 Seiten
Hypre-2.0.0 Usr Manual
mojo63_99
Noch keine Bewertungen
5.3.2.8 Packet Tracer - Examine The ARP Table
Dokument6 Seiten
5.3.2.8 Packet Tracer - Examine The ARP Table
Luis Alberto Moreta
Noch keine Bewertungen
Computer Science Project: Java Programming
Dokument145 Seiten
Computer Science Project: Java Programming
Sarthak Rastogi
Noch keine Bewertungen
Introduction To de
Dokument49 Seiten
Introduction To de
Jacinto Siqueira
Noch keine Bewertungen
Frog Jumped Into The Pond
Dokument3 Seiten
Frog Jumped Into The Pond
Ivan Teh RunningMan
Noch keine Bewertungen
New 2
Dokument5 Seiten
New 2
Quazi Warish Ahmad
Noch keine Bewertungen
CVF PG
Dokument867 Seiten
CVF PG
Thiago Veloso
Noch keine Bewertungen
Lambda DG
Dokument487 Seiten
Lambda DG
Sai Sandeep
Noch keine Bewertungen
ENG Change Order Process
Dokument2 Seiten
ENG Change Order Process
soureddy1981
Noch keine Bewertungen
This Manual Is Part of The Tru-Balance Box-Type Sifter and Should Be Kept Near The Machine For Users To Read
Dokument1 Seite
This Manual Is Part of The Tru-Balance Box-Type Sifter and Should Be Kept Near The Machine For Users To Read
miguel
Noch keine Bewertungen
HTML PDF
Dokument25 Seiten
HTML PDF
Anonymous eKs7Jj
Noch keine Bewertungen
CRM Flipkart
Dokument17 Seiten
CRM Flipkart
Harish Krishnan
80% (5)
Lab Manual 11 DSA
Dokument12 Seiten
Lab Manual 11 DSA
inft18111035 KFUEIT
Noch keine Bewertungen
Numerical Methods Questions PDF
Dokument14 Seiten
Numerical Methods Questions PDF
vivek chand
Noch keine Bewertungen
Slidex - Tips - Solutions To Introductory Statistical Mechanics Bowley
Dokument3 Seiten
Slidex - Tips - Solutions To Introductory Statistical Mechanics Bowley
Gouri Panikkaveettil
0% (1)
CM Cases
Dokument5 Seiten
CM Cases
Murk Zeeshan Jokhio
Noch keine Bewertungen
Chapter 3 - Syntax Analyzer
Dokument28 Seiten
Chapter 3 - Syntax Analyzer
Yitbarek Murche
Noch keine Bewertungen
SQL
Dokument181 Seiten
SQL
Vishnu Priya
100% (1)
EPLAN Platform Multi-User Application Recommendation
Dokument20 Seiten
EPLAN Platform Multi-User Application Recommendation
supportLSM
Noch keine Bewertungen
Advantages and Disadvantages of FFT Analyzer Technology
Dokument3 Seiten
Advantages and Disadvantages of FFT Analyzer Technology
Narendra Kulkarni
Noch keine Bewertungen
Laboratory1 - Algorithm Pseudocode Flowchart
Dokument8 Seiten
Laboratory1 - Algorithm Pseudocode Flowchart
Divine Vine
100% (1)
Irmo Errf2011
Dokument3 Seiten
Irmo Errf2011
Lorna U. Fernandez-Espinoza
Noch keine Bewertungen
Sistema de Registo e Login Com PHP e MySql
Dokument3 Seiten
Sistema de Registo e Login Com PHP e MySql
tzaraujo
Noch keine Bewertungen
Using Tooltips On Controls
Dokument4 Seiten
Using Tooltips On Controls
yuo2
Noch keine Bewertungen
MA-2018
Dokument261 Seiten
MA-2018
Dr Luis e Valdez rico
0% (1)
Komal Devi CV
Dokument8 Seiten
Komal Devi CV
Muhammed Amjad Islam
Noch keine Bewertungen
စံပယ္ျဖဴႏု လွပ္၍လြင့္ေသာ PDF
Dokument486 Seiten
စံပယ္ျဖဴႏု လွပ္၍လြင့္ေသာ PDF
Thu Kha
100% (1)
An Approach To Detect Remote Access Trojan in The Early Stage of Communication
Dokument8 Seiten
An Approach To Detect Remote Access Trojan in The Early Stage of Communication
Deri Andhany
Noch keine Bewertungen