A Framework For Speech Recognition Development

A Framework For Speech Application Development
By Jason Elroy Martis NMAMIT Nitte jason1987martis@gmail.com
Agenda
Introduction Applications Working and Types FSM Problems Proposed Solution Results Conclusion References
Normal Ways of Interaction

Normal Interaction actually works in 2 basic forms Language Meta Language (Body Language) Both forms occur simultaneously which makes interaction experience richer.
Language communicated through

Language is communicated in form of Speech What is Speech ??? Speech is the vocalized form of human communication. It is based upon the syntactic combination of lexicals and names that are drawn from vocabularies. It forms to be the most natural way of how we interact Example : Hey! How are you?
Hence Speech Recognition (SR)

Speech Recognition is the process of converting a speech signal to a sequence of lexicals by means of an algorithm. i.e Instruct something by speech signals and the computer will recognize it . Is this Necessary??? Of Course (It improves our natural way of communication with the electronic or virtual world )
Application of SR
There are innumerable applications. Some are Military Uses
Remote Command and Control Centers
(plane ,Satellite etc)
Health Care
Automated medical prescriptions WOW!!!
Educational Uses
Helps teachers and students too
So how does SR work ??

A very simple model demonstrates how SR works
Approaches of SR
Basically divided into 3
Acoustic Phonetic Approach (Works on phonemes) Pattern Recognition Approach ( Works on Patterns) Artificial Intelligence Approach ( Advanced Functionality)
Acoustic Phonetic Approach

Need to know phonetics (the Language of Enunciation ) Recognize Phonemes, convert to lexicals and match to words .
Pattern Recognition
Pattern Recognition Works in 2 Phases
Pattern Training Comparison
Pattern Training is modeled by a FSM (Finite State Machines). In simple words Speech Templates are created and stored .
The speakers recognized words and the stored templates are compared and verified If Matched: Accept Not Matched :Reject
Pattern Recognition Contd

Model:
Problems: Different accents can cause Problems
Artificial Intelligence Approach

This approach overcomes some disadvantages of Template based
Maintains a knowledge base Automatically correct words.
Eg What your name?? (Error!!!)
It overcomes some problems of Speaker variance and other constraints of Speech

E.g. Culture, Accent, etc..
Speech Recognition Model
Finite State Machines Based SR Model

It is a very simple approach 2 main Stages are present
The Acceptor The Transducer
Acceptor used for accepting of rejecting lexicals
Transducer is for transition from a set of words to another as i/p grows.
FSM based SR Model Contd

What if match causes a problem ( 2 words are same )
Know and no both sound same (How to overcome this problem ??)
Solution :We can attach weights to them to improve recognition (This can work better )
Performance of Speech based Systems

The performance of Speech works on 2 main basis WER (Word Error rate) WRR (Word Recognition Rate)
WER is simple indicating how the word is recognized
WRR is Word recognition Rate
So What is New in this ???

Theres Nothing new in this as speech recognition is developed from almost nothing to everything now All are attracted and developing lots of apps on it This causes an integrity issue
All apps are from scratch There can be App Conflicts (2 diff apps on same comp) Both apps are waiting for the same word and cause conflicts on same machine License on these machines (normal developer has to do nothing but sit silently until SDK comes) Yuck !!!
How can we Solve this

We Combine both of this Approaches Allow developers to build from scratch (This makes them independent) Allow a Platform where they can work together So, Why not build a framework where users can build things easily and plus from scratch We dont loose anything and we improve integrity issues
How does this Framework Look ???

Notice how integrity issue is resolved and apps are developed easily
Results
Notice how the results affect the accuracy
Type of Speech Normal Dictionary Speech Accuracy 50-90%
Choices (Customized)
Choices (General ) Individual Letters
90%
80% 30%
Customized Phonetics
70%
Conclusion
Speech is a natural way of Communication. Numerous applications of Speech are present. There are various approaches and they have their own Pros and Cons FSMs are one way to make job easier and better
There are lots of problems Recognition problems Integrity issues So , We need a platform independent framework that can solve these issues and make the life of speech developers easier.
References
[1] Wienstien C.J. Military and government applications of human-machine communication by voice. In Proceedings of the Natl. Acad. Sci. USA. Volume 92 10011 10016. October 1995. [2].Dat Tat Tran, Fuzzy Approaches to Speech and Speaker Recognition, A thesis submitted for the degree of Doctor of Philosophy of the university of Canberra. [3] R.K.Moore, Twenty things we still don t know about speech, Proc.CRIM/ FORWISS Workshop on Progress and Prospects of speech Research an Technology , 1994. [4].Sadaoki Furui, 50 years of Progress in speech and Speaker Recognition Research, ECTI Transactions on Computer and Information Technology, Vol.1. No.2 November 2005. [5]. Willie Walker .etal. Sphinx-4: A Flexible Open Source Framework for Speech Recognition http://cmusphinx.sourceforge.net/sphinx4 [6] M.A.Anusuya, Speech Recognition by Machine: A Review. In (IJCSIS) International Journal of Computer Science and Information Security, Vol. 6, No. 3, 2009 http://arxiv.org/ftp/arxiv/papers/1001/1001.2267.pdf [7] Neann Mathai, A Literature Survey of Speech Recognition and Hidden Markov Models. http://shenzi.cs.uct.ac.za/~honsproj/cgibin/view/2009/katz_mathai_sobey.zip/Speech_Katz_Mathai_Sobey/Downloads/NeannMathaiLiteratureSu rvey.pdf [8] Pavel Stemberk, Speech recognition based on FSM and HTK toolkits http://stembep.wz.cz/!papers/Zilina-dt04/zildt04.pdf [9] Steve Renals, Speech recognition. http://dsp-book.narod.ru/rec-notes.pdf

A Framework For Speech Recognition Development

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

A Framework For Speech Recognition Development

Hochgeladen von

Copyright:

Verfügbare Formate

A Framework For Speech Application Development

By Jason Elroy Martis NMAMIT Nitte jason1987martis@gmail.com

Normal Ways of Interaction

Language communicated through

Hence Speech Recognition (SR)

So how does SR work ??

Acoustic Phonetic Approach

Pattern Recognition Contd

Problems: Different accents can cause Problems

Artificial Intelligence Approach

It overcomes some problems of Speaker variance and other constraints of Speech

Speech Recognition Model

Finite State Machines Based SR Model

Acceptor used for accepting of rejecting lexicals

Transducer is for transition from a set of words to another as i/p grows.

FSM based SR Model Contd

Performance of Speech based Systems

WER is simple indicating how the word is recognized

WRR is Word recognition Rate

So What is New in this ???

How can we Solve this

How does this Framework Look ???

Das könnte Ihnen auch gefallen