Beruflich Dokumente
Kultur Dokumente
A
MAJOR PROJECT
Submitted For The Partial Fulfilment Of The Requirement
For The Award Of Degree Of
BACHELOR OF ENGINEERING
IN
COMPUTER SCIENCE & ENGINEERING
Submitted by:
1. Aakash Shrivastava(0101CS101001)
2. Ashish Kumar Namdeo(0101CS101024)
3. Avinash Dongre(0101CS101026)
4. Chitransh Surheley(0101CS101031)
Guided By:
Prof. Shikha Agarwal
ii
Dr. V.K.Sethi
(Director, UIT-RGPV)
DECLARATION BY CANDIDATE
We hereby declare that the work which is being presented in the Major project Expert System
Voice Assistant submitted in partial fulfillment of the requirement for the award of Bachelor
Degree in Computer Science & Engineering .The work which has been carried out at
University Institute of Technology, RGPV, Bhopal is an authentic record of our work carried
under the guidance of Prof. Shikha Agrawal Department of Computer Science & Engineering,
UIT-RGPV, Bhopal.
The matter written in this project has not been submitted by us for the award of any other
degree.
Aakash Shrivastava(0101CS101001)
Ashish Kumar Namdeo(0101CS101024)
Avinash Dongre(0101CS101026)
Chitransh Surheley(0101CS101031)
iii
ACKNOWLEDGEMENT
We take the opportunity to express our cordial gratitude and deep sense of indebtedness to our
guide Prof. Shikha Agrawal, Department / Computer Science and Engineering for the valuable
guidance and inspiration throughout the project duration. We feel thankful to her for their
innovative ideas, which led to successful completion of this project work. She has always
welcomed our problem and helped us to clear our doubt. We will always be grateful to them for
providing us moral support and sufficient time.
We owe our sincere thanks to Dr. Sanjay Silakari (HOD, CSE) who helped us duly in time
during our project work in the Department.
At the same time, we would like to thank all other faculty members and all non-teaching staff in
Computer Science and Engineering Department for their valuable co-operation.
Aakash Shrivastava(0101CS101001)
Ashish Kumar Namdeo(0101CS101024)
Avinash Dongre(0101CS101026)
Chitransh Surheley(0101CS101031)
iv
Abstract
Speech interface to computer is the next big step that computer science need to take for
general users. Speech recognition will play an important role in taking technology to them.
Our goal is to create a speech recognition software that can recognise spoken words. This
report takes a brief look at the basic building block of a speech recognition, speech synthesis
and the overall human and computer interaction. The most important purpose of this project is
to understand the interface between a person and a computer. Traditional or orthodox ways of
interaction are keyboard, mouse or any other input device but nowadays the computing has
become more sophisticated and complex operation. With these properties we have got the
advantage and resources to think about building a more modern interface which will allow us
to make a more natural looking interaction. So in this project, we have tried to develop an
application which will make the human - computer interaction more interesting and user
friendly. It is called the Expert System Voice Assistant the main application of this project is
that it takes human voice as an input,processes it accordingly, does the given task and
responds at the end. This project is Digital life assistant which uses mainly human
communication means such Twitter, instant message and voice to create two way connections
between human and his computer, controlling power, documents, social media and much
more. In our project we mainly use voice as communication, so it is basically the Speech
recognition application. The concept of speech technology really encompasses two
technologies: Synthesizer and Recognizer. A speech synthesizer takes as input and produces
an audio stream as output. A speech recognizer on the other hand does opposite. It takes an
audio stream as input and thus turns it into text transcription. The voice is a signal of infinite
information. A direct analysis and synthesizing the complex voice signal is due to too much
information contained in the signal. Therefore the digital signal processes such as Feature
Extraction and Feature Matching are introduced to represent the voice signal. In this project
we directly use speech engine which use Feature extraction technique as Mel scaled frequency
cepstral. The mel- scaled frequency cepstral coefficients (MFCCs) derived from Fourier
transform and filter bank analysis are perhaps the most widely used front- ends in state-of-theart speech recognition systems. Our aim to create more and more functionalities which can
help human to assist in their daily life and also reduces their efforts.
v
Table of Contents
1.0
1.1
1.2
1.3
1.4
2.0
LITERATURE SURVEY AND RELATED WORK ---- ERROR! BOOKMARK NOT DEFINED.
2.1
2.2
2.3
2.4
3.0
6.1
6.2
7.0
5.1
5.2
5.3
6.0
4.1
4.2
5.0
3.1
3.2
3.3
4.0
vi
vii
Chapter 1
1. Introduction
Speech is an effective and natural way for people to interact with applications, complementing
or even replacing the use of mice, keyboards, controllers, and gestures. A hands-free, yet
accurate way to communicate with applications, speech lets people be productive and stay
informed in a variety of situations where other interfaces will not. Speech recognition is a
topic that is very useful in many applications and environments in our daily life. Generally
speech recognizer is a machine which understands humans and their spoken word in some
way and can act thereafter. A di erent aspect of speech recognition is to facilitate for people
with functional disability or other kinds of handicap. To make their daily chores easier, voice
control could be helpful. With their voice they could operate the system. This leads to the
discussion about intelligent homes where these operations can be made available for the
common man as well as for handicapped.Voice activated systems and gesture control systems
have taken the experiences of the nave end-users to the next level. Present day users are able
to access or control the system without making a physical interaction with the computer. The
proposed model presents a new approach to voice activated control systems which enhances
the response time and user experience by looking beyond the steps of speech recognition and
focus on the post processing step of natural language processing. The proposed method
conceives the system as a Deterministic Finite State Automata, where each state is allowed a
finite set of keywords, which will be listened to by the speech recognition system. This is
achieved by the introduction of a new system to handle Finite Automata called Switch State
Mechanism. The natural language processing is used to regularly update the state keywords
and give the user a life like interaction with the computer.
With the input functionality of speech recognition, your application can monitor the state,
level, and format of the input signal, and receive notification about problems that might
interfere with successful recognition.You can create grammars programmatically using
constructors and methods on the GrammarBuilder and Choices classes. Your application can
dynamically modify programmatically created grammars while it is running. The structure of
1
grammars authored using these classes is independent of the Speech Recognition Grammar
Specification.
voice recognition fundamentally functions as a pipeline that converts PCM (Pulse Code
Modulation) digital audio from a sound card into recognized speech. The elements of the
pipeline are:
1. Transform the PCM digital audio into a better acoustic representation
2. Apply a "grammar" so the speech recognizer knows what phonemes to expect. A
grammar could be anything from a context-free grammar to full-blown English.
3. Figure out which phonemes are spoken.
4. Convert the phonemes into words.
are two of the known Speech Recognition software in open source. gives a
comparison of public domain software tools for speech recognition. Some commercial
software like IBMs ViaVoice are also available.
1.1.1 SIRI
SIRI is an intelligent personal assistant and knowledge navigator which works as an
application for Apple Inc.'s iOS. The application uses a natural language user interface to
answer questions, make recommendations, and perform actions by delegating requests to a set
of Web services. Apple claims that the software adapts to the user's individual preferences
over time and personalizes results. The name Siri is Norwegian, meaning "beautiful woman
who leads you to victory", and comes from the intended name for the original developer's first
child.
Siri was originally introduced as an iOS application available in the App Store by Siri, Inc.,
which was acquired by Apple on April 28, 2010. Siri, Inc. had announced that their software
2
would be available for BlackBerry and for phones running Android, but all development
efforts for non-Apple platforms were cancelled after the acquisition by Apple.
Siri has been an integral part of iOS since iOS 5 and was introduced as a feature of the iPhone
4S in October 14, 2011. Siri was added to the third generation iPad with the release of iOS 6
in September 2012, and has been included on all iOS devices released during or after October
2012. Siri has several fascinating features where you can call or text someone, search
anything, open any app etc with your voice which is very helpful indeed.
1.1.2 S-VOICE
S Voice is an intelligent personal assistant and knowledge navigator which is only available as
a built-in application for the Samsung Galaxy smartphones. The application uses a natural
language user interface to answer questions, make recommendations, and perform actions by
delegating requests to a set of Web services. It is based on the Vlingo personal assistant.
Some of the capabilities of S Voice include making appointments, opening apps, setting
alarms, updating social network websites such as Facebook or Twitter and navigation. S Voice
also offers efficient multitasking as well as automatic activation features, for example when
the car engine is started.
s-voice possesses same features as siri.
The expert system voice assistant is based on the combination of 3 major operations
Speech Recognition
Intermediate Operations and result creation
Speech Synthesis
The acoustic model represents the acoustic sounds of a language, and can be trained to
recognize the characteristics of a particular user's speech patterns and acoustic
environments.
The lexicon lists a large number of the words in the language, and provides information
on how to pronounce each word.
The language model represents the ways in which the words of a language are combined.
For any given segment of sound, there are many things the speaker could potentially be saying.
The quality of a recognizer is determined by how good it is at refining its search, eliminating the
poor matches, and selecting the more likely matches. This depends in large part on the quality of
its language and acoustic models and the effectiveness of its algorithms, both for processing
sound and for searching across the models.
Grammars
While the built-in language model of a recognizer is intended to represent a comprehensive
language domain (such as everyday spoken English), a speech application will often need to
process only certain utterances that have particular semantic meaning to that application. Rather
than using the general purpose language model, an application should use a grammar that
constrains the recognizer to listen only for speech that is meaningful to the application. This
provides the following benefits:
Enables the recognition engine to specify the semantic values inherent in the recognized
text
1.2.1 Algorithms
Both acoustic modeling and language modeling are important parts of modern statisticallybased speech recognition algorithms. Hidden Markov models (HMMs) are widely used in
many systems. Language modeling is also used in many other natural language processing
applications such as document classification or statistical machine translation.
Hidden Markov models
Modern general-purpose speech recognition systems are based on Hidden Markov Models.
These are statistical models that output a sequence of symbols or quantities. HMMs are used
in speech recognition because a speech signal can be viewed as a piecewise stationary signal
or a short-time stationary signal. In a short time-scale (e.g., 10 milliseconds), speech can be
approximated as a stationary process. Speech can be thought of as a Markov model for many
stochastic purposes.
Another reason why HMMs are popular is because they can be trained automatically and are
simple and computationally feasible to use. In speech recognition, the hidden Markov model
would output a sequence of n-dimensional real-valued vectors (with n being a small integer,
such as 10), outputting one of these every 10 milliseconds. The vectors would consist of
cepstral coefficients, which are obtained by taking a Fourier transform of a short time window
of speech and decorrelating the spectrum using a cosine transform, then taking the first (most
significant) coefficients. The hidden Markov model will tend to have in each state a statistical
distribution that is a mixture of diagonal covariance Gaussians, which will give a likelihood
for each observed vector. Each word, or (for more general speech recognition systems), each
phoneme, will have a different output distribution; a hidden Markov model for a sequence of
words or phonemes is made by concatenating the individual trained hidden Markov models
for the separate words and phonemes.
7
Described above are the core elements of the most common, HMM-based approach to speech
recognition. Modern speech recognition systems use various combinations of a number of
standard techniques in order to improve results over the basic approach described above. A
typical large-vocabulary system would need context dependency for the phonemes (so
phonemes with different left and right context have different realizations as HMM states); it
would use cepstral normalization to normalize for different speaker and recording conditions;
for further speaker normalization it might use vocal tract length normalization (VTLN) for
male-female normalization and maximum likelihood linear regression(MLLR) for more
general speaker adaptation. The features would have so-called delta and delta-delta
coefficients to capture speech dynamics and in addition might useheteroscedastic linear
discriminant analysis (HLDA); or might skip the delta and delta-delta coefficients and use
splicing and an LDA-based projection followed perhaps byheteroscedastic linear discriminant
analysis or a global semi-tied covariance transform (also known as maximum likelihood linear
transform, or MLLT). Many systems use so-called discriminative training techniques that
dispense with a purely statistical approach to HMM parameter estimation and instead optimize
some classification-related measure of the training data. Examples are maximum mutual
information (MMI), minimum classification error (MCE) and minimum phone error (MPE).
Decoding of the speech (the term for what happens when the system is presented with a new
utterance and must compute the most likely source sentence) would probably use the Viterbi
algorithm to find the best path, and here there is a choice between dynamically creating a
combination hidden Markov model, which includes both the acoustic and language model
information, and combining it statically beforehand (the finite state transducer, or FST,
approach).
A possible improvement to decoding is to keep a set of good candidates instead of just
keeping the best candidate, and to use a better scoring function (rescoring) to rate these good
candidates so that we may pick the best one according to this refined score. The set of
candidates can be kept either as a list (the N-best list approach) or as a subset of the models (a
lattice). Rescoring is usually done by trying to minimize the Bayes risk (or an approximation
thereof): Instead of taking the source sentence with maximal probability, we try to take the
sentence that minimizes the expectation of a given loss function with regards to all possible
8
transcriptions (i.e., we take the sentence that minimizes the average distance to other possible
sentences weighted by their estimated probability). The loss function is usually the
Levenshtein distance, though it can be different distances for specific tasks; the set of possible
transcriptions is, of course, pruned to maintain tractability. Efficient algorithms have been
devised to rescore lattices represented as weighted finite state transducers with edit distances
represented themselves as a finite state transducer verifying certain assumptions.
Dynamic time warping (DTW)-based speech recognition
Dynamic time warping is an approach that was historically used for speech recognition but has
now largely been displaced by the more successful HMM-based approach.
Dynamic time warping is an algorithm for measuring similarity between two sequences that
may vary in time or speed. For instance, similarities in walking patterns would be detected,
even if in one video the person was walking slowly and if in another he or she were walking
more quickly, or even if there were accelerations and decelerations during the course of one
observation. DTW has been applied to video, audio, and graphics indeed, any data that can
be turned into a linear representation can be analyzed with DTW.
A well-known application has been automatic speech recognition, to cope with different
speaking speeds. In general, it is a method that allows a computer to find an optimal match
between two given sequences (e.g., time series) with certain restrictions. That is, the
sequences are "warped" non-linearly to match each other. This sequence alignment method is
often used in the context of hidden Markov models.
Neural networks
Neural networks emerged as an attractive acoustic modeling approach in ASR in the late
1980s. Since then, neural networks have been used in many aspects of speech recognition such
as phoneme classification, isolated word recognition, and speaker adaptation.
In contrast to HMMs, neural networks make no assumptions about feature statistical
properties and have several qualities making them attractive recognition models for speech
recognition. When used to estimate the probabilities of a speech feature segment, neural
networks allow discriminative training in a natural and efficient manner. Few assumptions on
the statistics of input features are made with neural networks. However, in spite of their
effectiveness in classifying short-time units such as individual phones and isolated words,
neural networks are rarely successful for continuous recognition tasks, largely because of their
lack of ability to model temporal dependencies. Thus, one alternative approach is to use neural
networks as a pre-processing e.g. feature transformation, dimensionality reduction, for the
HMM based recognition.
10
11
A text-to-speech system (or "engine") is composed of two parts. a front-end and a back-end.
The front-end has two major tasks. First, it converts raw text containing symbols like numbers
and abbreviations into the equivalent of written-out words. This process is often called text
normalization, pre-processing, or tokenization. The front-end then assigns phonetic
transcriptions to each word, and divides and marks the text into prosodic units, like phrases,
clauses, and sentences. The process of assigning phonetic transcriptions to words is called
text-to-phoneme or grapheme-to-phoneme conversion. Phonetic transcriptions and prosody
information together make up the symbolic linguistic representation that is output by the frontend. The back-endoften referred to as the synthesizerthen converts the symbolic linguistic
representation into sound. In certain systems, this part includes the computation of the target
prosody (pitch contour, phoneme durations), which is then imposed on the output speech.
12
13
Chapter 2
14
by Microsoft to allow the use of speech recognition and speech synthesis within Windows
applications. Applications that use SAPI include Microsoft Office,Microsoft Agent and
Microsoft Speech Server.. In general all API have been designed such that a software
developer can write an application to perform speech recognition and synthesis by using a
standard set of interfaces, accessible from a variety of programming languages. In addition, it
is possible for a 3rd-party company to produce their own Speech Recognition and Text-ToSpeech engines or adapt existing engines to work with SAPI. Basically Speech platform
consist of an application runtimes that provides speech functionality, an Application Program
Interface (API) for managing the runtime and Runtime Languages that enable speech
recognition and speech synthesis (text-to-speech or TTS) in specific languages.
There have been two main 'families' of the Microsoft Speech API. SAPI versions 1 through 4
are all similar to each other, with extra features in each newer version. SAPI 5 however was a
completely new interface, released in 2000. Since then several sub-versions of this API have
been released.
17
18
Chapter 3
3. Problem Description
A voice assistant is not a very traditional or orthodox application. These applications are not
generally available in a very big context. Other thing is that not all the people can interact with
the computer via orthodox input methods like keyboards or mouse click. Some people with
physical disability or those who are not able to see, may find it very difficult to interact with
the computer but with the help of this application they can feel like operating the computer as
smoothly as the normal people do. The problem is that we have to combine the features of
speech recognition, interpretation,
system manipulation, command generation and speech synthesis.
we want the computer to recognize our spoken words and we want the spoken operation to be
performed. After all that we want the application to respond in text to speech or any other
synthetic voice feedback.
We have to make sure that the application understands every command and provides the
results with feedback
19
Chapter 4
4. Proposed Work
Here are a few proposed function that are being included into project
1. weather - Gives the local weather for the current day You can set the specific location and
when you are connected to internet you can easily ask for the local weather update and
forecast. The assistant will vocally specify about the current situation.
2. forecast - Gives the local weather for the next few days. You can speak the word
forecast and can get a glimpse of the future conditions vocally.
3. News - Shows the latest news headline from the BBC. You just have to make a statement
about news or anything identical and it will either read the news to you or it will show them
on the internet.
4. Alarm - Starts the alarm chain command for wake up You have to specify the time and its
done . It will notify you when the eventual time is arrived.
5. Time - Displays the time.
6. Date - Displays the date date and time - Displays both the date and time.
7. Mute - Mutes system volume.
8. Unmute - Unmutes system volume.
9. Radio - Streams From the internet, instantly.
10. Introduce - Gives a general introduction to The expert system voice assistant.
11. Speak - You can get the sample of the current TTS voice of the application. For more
accuracy, various modern TTS voices can be embedded into the project which gives you the
options of changing the voice of the application according to your preference .
20
12. killtask - Kills a specified task You have to specify vocally which running task is to be
killed.
13. CMD - Starts a new command prompt window.
14. Start or Close any Program or Directory - You can start any program by saying its
name. You can open or close any directory by commanding the name and you can switch from
one another via voice well. The confirmation of the start and termination can be vocal.
15. Tasklist - Views current running processes.
16. lock - Locks the workstation.
17. Screen off - Turns off the monitor.can dim the brightness of the screen.
18. System specific tasks - You can control your computers regular operations via voice
commands. like you can turn off or put asleep the computer by saying that. you can open close
disk tray by voice commands You can turn you computer off by saying turn off or can put it to
sleep by saying sleep etc..
19. Open any website - You can open a specific website by calling it. This includes many
famous websites.
20. What is there to offer: The first thing will be to know the potentials and capabilities of
the project, So if the user says what can you do or commands the application will show
the list of commands and operations it can perform.
21 . Print this page: This command is said to print a specific page. The application will take
the spoken word print as an input and the status of the task will be provided as the output
via voice.
22. Screenshot anything: You can take the screenshot of any page or window by saying the
word.
21
23. Play music or video Locally: You can just simply instruct the assistant to play a local
music or video file, On the basis of name, artist or genre etc.
24. Multimedia Control: You can control the volume and select the playlist and go to next or
previous track on the basis of voice commands.
25. Manage your Email: You can manage and check for any new emails by saying
something like check mail. The system will vocally response about the fed command and
can read your emails for you.
26 . Presentation control: You can start the presentation go to previous or next slides and end
the presentation
27. Delete file: You can delete any selected file by saying this command
28. Cut/Copy/Paste: You can do these operations on any selected file or text
29. Select all: Say it and it will select the whole document or all the files
Program Options
Start Automatically - If checked, this program will be added to your start-up folder so that it
will start automatically each time you start Windows.
Show Progress Bars - The program can monitor your usage of the mouse and keyboard and
show you the progress you are making at using your voice instead of the mouse and keyboard.
Progress is measured on several dimensions including: mouse clicks, mouse movement,
keyboard letters, and navigation/function keys.
General options 1. Open and Close Programs
2. Navigate Programs/Folders
3. Switch or Minimize Windows
4. Change Settings
22
23
Chapter 5
5. Design and Development
5.1 Required:
Hardware: Pentium Processor, 512MB of RAM, 10GB HDD.
OS: Windows.
Language: C#.
Tools: .Net Framework 4.5, Microsoft Visual Studio 2010, voice macros.The speech signal
and all its characteristics can be represented in two different domains, the time and the
frequency domain A speech signal is a slowly time varying signal in the sense that, when
examined over a short period of time (between 5 and 100 ms), its characteristics are shorttime stationary. This is not the case if we look at a speech signal under a longer time
perspective (approximately time T>0.5 s). In this case the signals characteristics are nonstationary, meaning that it changes to reflect the different
sounds spoken by the talker To be able to use a speech signal and interpret its characteristics
in a proper manner some kind of representation of the speech signal are preferred.
24
Subversion) and adding new toolsets like editors and visual designers for domain-specific
languages or toolsets for other aspects of the software development lifecycle(like the Team
Foundation Server client: Team Explorer).
Visual Studio supports different programming languages and allows the code editor and
debugger to support (to varying degrees) nearly any programming language, provided a
language-specific service exists. Built-in languages include C, C++ and C++/CLI (via Visual
C++), VB.NET (via Visual Basic .NET), C# (via Visual C#), and F# (as of Visual Studio
2010). Support for other languages such as M, Python, and Ruby among others is available via
language services installed separately. It also supports XML/XSLT, HTML/XHTML,
JavaScript and CSS.
Microsoft provides "Express" editions of its Visual Studio at no cost. Commercial versions of
Visual Studio along with select past versions are available for free to students via Microsoft's
DreamSpark program
audio files. PromptBuilder also allows you to select a speaking voice and to control attributes
of the voice such as rate and volume. See Construct and Speak a Simple Prompt and Construct
a Complex Prompt for more information and examples
Initialize and Manage the Speech Synthesizer
The SpeechSynthesizer class provides access to the functionality of a TTS engine in Windows
Vista, Windows 7, and in Windows Server 2008. Using the SpeechSynthesizerclass, you can
select a speaking voice, specify the output for generated speech, create handlers for events that
the speech synthesizer generates, and start, pause, and resume speech generation.
Generate Speech
Using methods on the SpeechSynthesizer class, you can generate speech as either a
synchronous or an asynchronous operation from text, SSML markup, files containing text or
SSML markup, and prerecorded audio files.
Respond to Events
When generating synthesized speech, the SpeechSynthesizer raises events that inform a
speech application about the beginning and end of the speaking of a prompt, the progress of a
speak operation, and details about specific features encountered in a prompt. EventArgs
classes provide notification and information about events raised and allow you to write
handlers that respond to events as they occur
Control Voice Characteristics
To control the characteristics of speech output, you can select a voice with specific attributes
such as language or gender, modify properties of the SpeechSynthesizer such as rate and
volume, or adding instructions either in prompt content or in separate lexicon files that guide
the pronunciation of specified words or phrases.
Apart from the analysis some manual scripts can help in answering the most common
questions without having the trouble of creating a process .
26
Chapter 6
6.Implementation and coding
6.1 Post Query Design
In Visual C#, you can use either the Windows Form Designer or the Windows Presentation
Foundation (WPF) Designer to quickly and conveniently create user interfaces. For
information to help you decide what type of application to build
Although you can also create your UI by manually writing your own code, designers enable
you to do this work much faster.
Adding Controls
In either designer, you use the mouse to drag controls, which are components with visual
representation such as buttons and text boxes, onto a design surface.As you work visually, the
Windows Forms Designer translates your actions into C# source code and writes them into a
project file that is named name.designer.cs where name is the name that you gave to the form.
Similarly, the WPF designer translates actions on the design surface into Extensible
Application Markup Language (XAML) code and writes it into a project file that is named
Window.xaml. When your application runs, that source code (Windows Form) or XAML
(WPF) will position and size your UI elements so that they appear just as they do on the
design surface. For more information.
Setting Properties
After you add a control to the design surface, you can use the Properties window to set its
properties, such as background color and default text.
27
In the Windows Form designer, the values that you specify in the Properties window are the
initial values that will be assigned to that property when the control is created at run time. In
the WPF designer, the values that you specify in the Properties window are stored as attributes
in the window's XAML file.
In many cases, those values can be accessed or changed programmatically at run time by
getting or setting the property on the instance of the control class in your application. The
Properties window is useful at design time because it enables you to browse all the properties,
events, and methods supported on a control.
Handling Events
Programs with graphical user interfaces are primarily event-driven. They wait until a user does
something such as typing text into a text box, clicking a button, or changing a selection in a
listbox. When that occurs, the control, which is just an instance of a .NET Framework class,
sends an event to your application. You have the option of handling an event by writing a
special method in your application that will be called when the event is received.
You can use the Properties window to specify which events you want to handle in your code.
Select a control in the designer and click the Events button, with the lightning bolt icon, on the
Properties window toolbar to see its events.
When you add an event handler through the Properties window, the designer automatically
writes the empty method body. You must write the code to make the method do something
useful. Most controls generate many events, but frequently an application will only have to
handle some of them, or even only one. For example, you probably have to handle a button's
Click event, but you do not have to handle its Size Changed event unless you want to do
something when the size of the button changes.
28
parameters of the correct type are used. The handler for the SpeechRecognized event shown in
the following example displays the text of the recognized word or phrase using the Result
property on the SpeechRecognizedEventArgs parameter, e.
void sr_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
MessageBox.Show(e.Result.Text);
}
System.Speech.Synthesis
Namespace has been efficiently used to synthesize the speech and it gets underway like that
using System;
using System.Speech.Synthesis;
namespace SampleSynthesis
{
class Program
{
static void Main(string[] args)
{
synth.SetOutputToDefaultAudioDevice();
Console.WriteLine();
Console.WriteLine("Press any key to exit...");
Console.ReadKey();
30
} }}
System.Diagnostics.Process.Start(Name) can be used to execute the commanded text.t
public static Process Start(
string fileName,
string arguments,
string userName,
SecureString password,
string domain
)
No
back
new folder
take screenshot
paint
go up
go down
save
save as
delete
cut
away
reload
start presentation
next slide
previous slide
end presentation
zoom in
hold control
6.5 RSS_Reader
using System;
using System.Linq;
using System.Text;
using CustomizeableJarvis.Properties;
using System.Xml;
using System.Xml.Linq;
using System.Net;
namespace CustomizeableJarvis
{
34
class RSSReader
{
public static void CheckForEmails()
{
string GmailAtomUrl = "https://mail.google.com/mail/feed/atom";
XmlUrlResolver xmlResolver = new XmlUrlResolver();
xmlResolver.Credentials
new
NetworkCredential(Settings.Default.GmailUser,
Settings.Default.GmailPassword);
XmlTextReader xmlReader = new XmlTextReader(GmailAtomUrl);
xmlReader.XmlResolver = xmlResolver;
try
{
XNamespace ns = XNamespace.Get("http://purl.org/atom/ns#");
XDocument xmlFeed = XDocument.Load(xmlReader);
{
string query = String.Format("http://weather.yahooapis.com/forecastrss?w=" +
Settings.Default.WOEID.ToString() + "&u=" + Settings.Default.Temperature);
XmlDocument wData = new XmlDocument();
wData.Load(query);
XmlNamespaceManager man = new XmlNamespaceManager(wData.NameTable);
man.AddNamespace("yweather", "http://xml.weather.yahoo.com/ns/rss/1.0");
XmlNode channel = wData.SelectSingleNode("rss").SelectSingleNode("channel");
XmlNodeList nodes = wData.SelectNodes("/rss/channel/item/yweather:forecast",
man);
frmMain.Temperature=
channel.SelectSingleNode("item").SelectSingleNode("yweather:condition",
man).Attributes["temp"].Value;
frmMain.Condition=
channel.SelectSingleNode("item").SelectSingleNode("yweather:condition",
man).Attributes["text"].Value;
frmMain.Humidity
channel.SelectSingleNode("yweather:atmosphere",
man).Attributes["humidity"].Value;
frmMain.WinSpeed
channel.SelectSingleNode("yweather:wind",
man).Attributes["speed"].Value;
frmMain.Town
channel.SelectSingleNode("yweather:location",
man).Attributes["city"].Value;
frmMain.TFCond=
channel.SelectSingleNode("item").SelectSingleNode("yweather:forecast",
37
man).Attributes["text"].Value;
frmMain.TFHigh=
channel.SelectSingleNode("item").SelectSingleNode("yweather:forecast",
man).Attributes["high"].Value;
frmMain.TFLow=
channel.SelectSingleNode("item").SelectSingleNode("yweather:forecast",
man).Attributes["low"].Value;
frmMain.QEvent = "connected";
}
catch { frmMain.QEvent = "failed"; }
}
public static void CheckBloggerForUpdates()
{
if (frmMain.QEvent == "UpdateYesNo")
{
frmMain.Jarvis.SpeakAsync("There is a new update available. Shall I start the
download?");
}
else
{
String UpdateMessage;
String UpdateDownloadLink;
string AtomFeedURL = "http://google.com";
XmlUrlResolver xmlResolver = new XmlUrlResolver();
XmlTextReader xmlReader = new XmlTextReader(AtomFeedURL);
xmlReader.XmlResolver = xmlResolver;
XNamespace ns = XNamespace.Get("http://www.w3.org/2005/Atom");
38
"Update
Message");
frmMain.Jarvis.SpeakAsyncCancelAll();
frmMain.Jarvis.SpeakAsync("Would you like me to download the update?");
frmMain.QEvent = "UpdateYesNo";
Properties.Settings.Default.RecentUpdate = UpdateDownloadLink;
Properties.Settings.Default.Save();
}
}}}}
39
Chapter 7
7.1 Conclusion and Future work:
In this project a simple mechanism that could eliminate the excess use of Natural Language
Processing. This takes us another step closer to the most ideal expert voice assistant However,
there is still lot of scope for research on this topic and Switch State Mechanism only offers us
a partial solution that solves the responsiveness issue or the computation time for
understanding the command
In this Project Expert voice assistant which uses mainly human communication means such
Twitter, instant message and voice to create two way connections between human and his
computer, controlling it and its applications, notify him of breaking news, Facebooks
Notifications and many more. In our project we mainly use voice as communication means so
the ESVA is basically the Speech recognition application. The concept of speech technology
really encompasses two technologies: Synthesizer and recognizer. A speech synthesizer takes
as input and produces an audio stream as output. A speech recognizer on the other hand does
opposite. It takes an audio stream as input and thus turns it into text transcription. The voice is
a signal of infinite information. A direct analysis and synthesizing the complex voice signal is
due to too much information contained in the signal. Therefore the digital signal processes
such as Feature Extraction and Feature Matching are introduced to represent the voice signal.
In this project we directly use speech engine which use Feature extraction technique as Mel
scaled frequency cepstral. The mel- scaled frequency cepstral coefficients (MFCCs) derived
from Fourier transform and filter bank analysis are perhaps the most widely used front- ends
in state-of-the-art speech recognition systems. Our aim to create more and more functionalities
which can help human to assist in their daily life and also reduces their efforts. In our test we
check all this functionality is working properly.
In the future this is going to be one of the most prominent technologies that are going to
evolve around the technical world. This application might not fulfill all the commands that
user want it to have but in future the commands can be in various ranges and forms Language
support can be extended as well
40
This project delivers most of the things that were promised. This works with a very good
efficiency as well. ESVA helped us in learn a lot about speech recognition, synthesis and
system processes and operations. Still there are a lot of possibilities in the field of speech and
artificial intelligence. It can go beyond the expected human machine interaction and can
deliver which we see in science fiction.
41
Chapter 8
42
References
[1] Siri Intelligent Personal Assistant for iOS. Available at http:// www. apple.com/ios/siri/
[2]
Now
Intelligent
Personal
Assistant
for
Android.
Available
at
http://www.google.com/landing/now/
[3] Project by Chad Barraford Available at http://projectjarvis.com/
[4] Project Alpha Available at http://alphawsr.tumblr.com/faqs
[5] Bahl, L.R.; Brown, P.F.; de Souza, P.V.; Mercer, R.L.; "Speech recognition with
continuous-parameter Hidden Markov models"; Acoustics, Speech, and Signal Processing,
1988. ICASSP-88., 1988 International Conference on , vol., no., pp.40-43 vol.1, 11-14 Apr
1988
[6] Christopher D. Manning, Hinrich Schtze, Foundations of Statistical Natural Language
Processing, MIT Press: 1999
[7] E.J. ONeil, P.E. ONeil, and G. Weikum; The LRU- k page replacement algorithm for
database disk buffering; In Proceedings of the 1993 ACM Sig-mod International Conference
on Management of Data, pages 297-306, 1993.
[8] Ciprian Chelba, Dan Bikel, Maria Shugrina, Patrick Nguyen, Shankar Kumar; Large
Scale
Language
Modeling
http://arxiv.org/abs/1210.8440
in
Automatic
Speech
Recognition;
Available
at
43
44
45