Beruflich Dokumente
Kultur Dokumente
I. ABSTRACT.....................................................................................................................................- 1 -
II. PROJECT PROPOSAL PLAN .....................................................................................................- 1 -
II.1. INTRODUCTION......................................................................................................................... - 1 -
II.2. DESIGN REQUIREMENTS ........................................................................................................... - 2 -
II.2.1. Functional Description of the Design and its Components ...........................................- 2 -
II.2.2. Technical Description of the Design and its Components .............................................- 3 -
II.2.3. Mathematical or Other Principles Embedded in the Project.........................................- 6 -
II.2.4. Performance Expectations/Objectives .........................................................................- 12 -
II.3. DESIGN APPROACHES ............................................................................................................. - 13 -
II.4. FINANCIAL BUDGET ............................................................................................................... - 17 -
II.5. PROJECT SCHEDULE ............................................................................................................... - 19 -
III. CONCLUSION.........................................................................................................................- 20 -
IV. REFERENCES.........................................................................................................................- 21 -
Tables:
TABLE A: COST OF PARTS ............................................................................................................- 17 -
TABLE B: COST OF PROJECT ........................................................................................................- 18 -
Figures:
FIGURE 1: DIAGRAM OF THE PROJECT.............................................................................................- 2 -
FIGURE 2: BLOCK DIAGRAM OF PROJECT .......................................................................................- 3 -
FIGURE 3: EXAMPLE OF A DSP ...........................................................................................................- 4 -
FIGURE 4: PDA TO BE USED.................................................................................................................- 4 -
FIGURE 5: TYPICAL SPEECH RECOGNITION PROCESSES...........................................................- 11 -
FIGURE 6: BREAKDOWN OF TASKS .................................................................................................- 19 -
FIGURE 7: GANTT CHART OF PROJECT...........................................................................................- 19 -
-i-
I. Abstract
The Voice Activated Remote Control project will address the need of people who
do not like to search for the remote control or do not have the energy to walk up to the
television or any device which makes use of a remote control. This project will aim to
create a device which can accept audio input and will send a corresponding signal to
another device atop the instrument wishing to be controlled to perform the required task.
We will develop an application which will run inside a device, such as a computer or
PDA, which will send the signal to a set-top device which we will create. At the
conclusion of this project, we will have a set-top device which will receive Bluetooth
signals from any device which supports Bluetooth and our software which will enable
Bluetooth-enabled PDA’s to take voice commands and transfer them to our device. By
creating a separate set-top box, we will be able to enable the product to be compatible to
II.1. Introduction
Our project is a voice-activated remote control, it entails putting together a device
that will be able to control a television set using voice commands. Instead of the
traditional infrared remote control, we are planning on extending it’s transmit range by
either that of a PDA or a DSP, will be used to analyze the voice commands given by the
user. It will then send the command via its attached Bluetooth transmitter. At the other
end, by the television there will be a customized Bluetooth receiver to receive the signal.
-1-
Finally it converts the RF signal into compatible infrared signal to be sent on a modified
remote control.
Although our project scope will only focus on controlling a television set, the
project can be modified for a numbers of applications. One example is a voice activated
garage door opener. The driver will no longer have to take his/her eye off the road to
press a button to open his garage door. Another application would be a voice-activated
Currently, the group aims to develop a prototype using two laptops connected via
Bluetooth. We will develop an interface for users to speak to and use a program to
analyze the voice. The command will then transfer to a module on the TV, which then
converts the command to infrared. After a working prototype has been successfully
developed, we will move towards a PDA. Finally, if time permits, we will build a
remote control using a DSP chip. The technical and financial details of this project will
-2-
Figure 2: Block Diagram of Project
Shack, specifically Catalog number 33-3026. The microphone will be connected to the
processor by a standard RCA jack, (or directly to the board we’re working with), which
will be connected to the appropriate pins or inputs that are connected to the processor.
The microprocessor chip will be simply placed inside the socket or will be connected so
as to make replacement easier in case the chip is damaged. We plan to utilize a PDA, as
our processor at first, and if successful we plan to upgrade to a standalone DSP chip and
microprocessor combination. The general outlook of the DSP will be like something
pictured below. The PDA we plan to use is a Toshiba e740 and it is also picture below.
-3-
Figure 3: Example of a DSP
An example of some DSP’s that we are planning on using is TI’s TMS320C54x line.
The DSP will be programmed to do voice recognition after which it will output to a
microcontroller which in turn will convert the interpreted command into a Bluetooth
-4-
signal using the appropriate protocol options. The Bluetooth device is a class 2 Bluetooth
transmitter, which means its output power is approximately 5 dBm and its range is
approximately 20m. This class was chosen so as not to cause interference with any
with the end user. An example of a speaker that can be used is from the following
computer speaker. This was chosen because it is very cheap and meets the objectives of
being an effective communication medium with the end user. The two wires from the
speakers will be directly soldered to the microcontroller socket to reduce the size of the
housing for the speaker, microphone, DSP, its socket, and the Bluetooth transmitter,
Before we implement this setup we will have an intermediate step, where we will utilize a
PDA, running a pocket PC operating system. This will serve exactly the same function
as the DSP connected to a microcontroller and a Bluetooth module. The PDA will serve
as a sort of simulation type environment for the actual DSP. And if that approach works
better we will leave the solution as is. This PDA will be Bluetooth enabled and will
On the receiving end, another Bluetooth transmitter will be used, however it will be set to
receive the signal from the transmitter. The transmitter will be directly connected to the
microcontroller, which will interpret the Bluetooth signal and generate the appropriate
infrared signal for the current device that is being operated (for example, the television).
-5-
microcontroller is found on the following website:
The output from the microcontroller will be connected to an infrared transmitter, which
we can purchase a cheap remote control and remove the IR transmitter. The IR
transmitter will be placed in front of the television or another device and will transmit the
appropriate signals to the device. The transmitter will be directly soldered to the
microcontroller socket.
remote control. First we have to understand how voice activation works. Then we also
Voice recognition is the process of taking the spoken word as an input to a computer
program. This process is important to our product, the voice activated remote control,
because it provides a fairly natural and intuitive way of controlling the channels on the
television while allowing the user to remain virtually undisturbed and undistracted from
looking for the remote control. Here we will discuss the principles and concepts behind
voice recognition. Although most of the principles have already been done for us, and
writing a voice recognition algorithm will be beyond our means, it is our belief we need
to understand how it works at least a slight bit in order to understand how to apply it to
our project.
-6-
What is voice recognition, and why is it useful in our project?
Voice recognition is "the technology by which sounds, words or phrases spoken by
humans are converted into electrical signals, and these signals are transformed into
coding patterns to which meaning has been assigned" [ADA90]. While the concept could
more generally be called "sound recognition", we focus here on the human voice because
we most often and most naturally use our voices to communicate our ideas to others in
our immediate surroundings. In the context of our product, a voice activate remote
control, the user would be most comfortable in their most common form of
communication, the voice, rather than pressing buttons. The difficulty in using voice as
speech and the more traditional forms of computer input. While computer programs are
commonly designed to produce a precise and well-defined response upon receiving the
proper (and equally precise) input, the human voice and spoken words are anything but
precise. Each human voice is different, and identical words can have different meanings
if spoken with different inflections or in different contexts. Several approaches have been
"template matching" and "feature analysis". Template matching is the simplest technique
and has the highest accuracy when used properly, but it also suffers from the most
limitations. As with any approach to voice recognition, the first step is for the user to
speak a word or phrase into a microphone. The electrical signal from the microphone is
-7-
determine the "meaning" of this voice input, the computer attempts to match the input
with a digitized voice sample, or template, that has a known meaning. This technique is a
close analogy to the traditional command inputs from a keyboard. The program contains
the input template, and attempts to match this template with the actual input using a
Since each person's voice is different, the program cannot possibly contain a template for
each potential user, so the program must first be "trained" with a new user's voice input
before that user's voice can be recognized by the program. During a training session, the
program displays a printed word or phrase, and the user speaks that word or phrase
several times into a microphone. The program computes a statistical average of the
multiple samples of the same word and stores the averaged sample as a template in a
program data structure. With this approach to voice recognition, the program has a
"vocabulary" that is limited to the words or phrases used in the training session, and its
user base is also limited to those users who have trained the program. This type of system
is known as "speaker dependent." It can have vocabularies on the order of a few hundred
words and short phrases, and recognition accuracy can be about 98 percent.
A more general form of voice recognition is available through feature analysis and this
find an exact or near-exact match between the actual voice input and a previously stored
voice template, this method first processes the voice input using "Fourier transforms" or
"linear predictive coding (LPC)", then attempts to find characteristic similarities between
the expected inputs and the actual digitized voice input. These similarities will be present
for a wide range of speakers, and so the system need not be trained by each new user. The
-8-
types of speech differences that the speaker-independent method can deal with, but which
pattern matching would fail to handle, include accents, and varying speed of delivery,
very difficult, with some of the greatest hurdles being the variety of accents and
can handle only discrete words, connected words, or continuous speech. Most voice
recognition systems are discrete word systems, and these are easiest to implement. For this
type of system, the speaker must pause between words. This is fine for situations where
the user is required to give only one word responses or commands, but is very unnatural
for multiple word inputs. In a connected word voice recognition system, the user is
allowed to speak in multiple word phrases, but he or she must still be careful to articulate
each word and not slur the end of one word into the beginning of the next word. Totally
natural, continuous speech includes a great deal of "co-articulation", where adjacent words
run together without pauses or any other apparent division between words. A speech
recognition system that handles continuous speech is the most difficult to implement.
While designing our project we need to consider all these aspects in deciding which type
So far as it stands we only need a discreet word system, or maybe a connected word
recognition system. Also, we plan on having the user-recognition software be good for a
myriad of users without having to train the system for each different user.
-9-
What disciplines are involved in voice recognition?
The template matching method of voice recognition is founded in the general principles
linguistics, and digital signal processing will also be looked at to gain further insight into
(http://www.hitl.washington.edu/scivw/EVE/I.D.2.d.VoiceRecognition.html)
The next portion describes the basic ideas we need in order to produce the system of
bringing information processed by the DSP/PDA outputted to a Bluetooth signal and then
brought to the television. We need to examine at least the four core protocols, baseband,
link manager, logical link control and adaptation, and service discovery protocol.
The baseband and control layer enables the physical link between the radio frequencies
among the blue tooth systems. It basically synchronizes all the transmissions to ensure
that no data is lost or cut off during the frequency hopping. Also baseband proves for
two different kinds of physical links, SCO and ACL. We will be using the Asynchronous
The Link Manager Protocol is responsible to setting up the link between the Bluetooth
devices. It manages such important aspects such as authentication and encryption. It also
controls the duty cycles and the connections states of the a Bluetooth unit in a piconet.
The Logical Link Control and Adaptation protocol (L2CAP) adapts the upper layer
protocols, (protocols that are not the four core protocols), over the baseband. It is only
supported for ACL. We don’t believe we might need to utilize this protocol, because we
- 10 -
won’t be using upper layer protocols. However, we do need to look into detail about
these protocols to make sure we understand exactly how each one is used.
And finally, the fourth core protocol is Service Discovery, which is very crucial to the
Bluetooth framework. This protocol ensures that a connection between two or more
(http://www.bluetooth.com)
We need to delve deeper into understanding these protocols and then also understand the
- 11 -
II.2.4. Performance Expectations/Objectives
The first performance objectives associated with our project is the response time between
the issuance of the voice command to the execution of the command. Obviously, the
lower the response time the better the system performs. The component that will be
primarily associated with this performance metric is the Voice Analyzer module. The
module will capture and analyze the speech, determine whether it is a command and
finally convert the command from infrared signal to Radio Frequency. All of these
operations may increase the response time significantly especially if it’s not properly
designed.
Another performance objective that we can use to evaluate our system is the sensitivity in
analyzing the voice. Voice and speech pattern varies from one person to another. Our
system has to be able to counter this difference in order to be considered successful. That
is, someone who has an accent should also be able to use this system with a large degree
of satisfaction. Likewise, the system should be able to analyze both low and high pitch
voice without any problems. Also, given that there may be background noise coming
from TV itself the system should be able to recognize the command in the presence of
noise.
The third performance objective we’d like to ensure is accuracy. When the user of this
system issue a voice command, say for example “Change to Channel 5”, the device
should accurately carry out the operation. If the device increases the volume 5 out of 10
times the above command is given then it is not performing very well. Accuracy, in our
opinion is the most important performance metric of all. The reason being the user of the
- 12 -
• The speech recognizer should recognize the user's voice properly at least 90% of
the time.
• It should recognize the commands despite relatively low level of noise coming
from the background and the TV itself. Note: In order to lower the relative level
• 99% of the time when the signal is transmitted from the Bluetooth base, receiver
should receive the proper signal to send to the TV. That is, once the DSP/PDA
has the voice interpreted properly, the TV or device being controlled should
• The D/A converter must properly interpret the signal from Bluetooth 100% of the
time
• The time between the issuance of a command to the execution of the command
should appear instantaneous to the user. In the worst case, the user should not
choosing and finalizing on a design project. It helps us a lot by giving us plenty of time to
think of different design approaches; we also had a lot of time to do research on the
resources that’s available to do our job. Our project is called the Voice Activated Remote
Control; a more detail description of our project is a touch-less remote control that will
employ Bluetooth technology for transmission in place of infrared signals. As our advisor
predicted, there are a plethora of ways to accomplish our design goals; at this early stage
- 13 -
of the senior design we already changed some of the items that we agreed on during
Design VI. We are quite confident that some of these things we agreed on will eventually
When we first finalize our project in Design VI, we agreed to use a commercially
available DSP chip as the baseline. We would then assemble this chip with a Bluetooth
transmitting chip onto a circuit board. The second step would be to program the DSP chip
to recognize voice commands. The third step would be to build another circuit board that
will consist of a microcontroller, Bluetooth receiver and an infrared transmitter. The idea
is that the microcontroller will take the signal from the Bluetooth receiver and convert it
to Infrared.
The benefits of this approach are that we would be learning a lot more about
circuit design and Digital signal processing. The risks, however would be that a suitable
DSP be selected and work according to specifications. Since a DSP chip doesn’t
necessary provide enough incentive for a company to provide a full scale of support, it
might be difficult to get a problem fixed when one is encountered. Also, sometimes the
company that manufacturers of these DSP chip might not have the product be readily
available; what this translates to is that we might have to wait a couple of months
between placing the order and actually receiving the DSP chip. Another risk would be the
programming of the DSP chip to recognize voice commands. The programming portion
of the project would be very intensive and demanding. Even with an elite set of
might not be feasible for us to program the DSP chip from scratch. Another area of
uncertainty is that neither one of our group members has prior experience in Bluetooth
- 14 -
technology; it might take us some time to experiment with this technology and be able to
utilize it to achieve our goal. Finally the infrared portion might present some problems,
we are not sure whether we can program the DSP chip on the receiving end to convert
replacing the DSP chip. With this approach, we’ll be able to save ourselves a lot of
waiting time and money. The reason being we already has laptops available to work with;
we wouldn’t need to waste any time waiting for the delivery of the DSP chip, along with
not having to purchase an extra piece of hardware. Another benefit of this approach is
that there are plenty of speech recognition development kit that is available from vendors
such as Microsoft and IBM. It would just be a matter of downloading and installing the
development kit. In this approach, we would focus our effort in programming our
application in the Windows environment to recognize commands and then sending the
command via the Bluetooth transmitter. Our laptops are equipped with a USB port and an
infrared port; this will help us immensely on the receiving end because the Bluetooth
receiver would have a USB interface. It would be connected to the laptop and will be
responsible for receiving Bluetooth signals sent by the transmitter. On the receiving end,
we would write a program to take the Bluetooth signal and convert it to infrared signal,
sent it via the infrared port and finally controlling the television set. Just like the first
approach, there will be uncertainties that we will be faced with. However, we think that
the uncertainty in this approach is significantly less than the DSP approach.
The uncertainties that might arise include whether we would be able to find a
- 15 -
recognition is very critical to the success of our project; the final product has to be able to
engine would be how perfect the program could recognize the speech. A successful
project would be one that takes less than three tries to recognize and execute a command.
Another thing that is noteworthy is the fact we were told by our professor that Bluetooth
might be replaced by Ultra Wide Band technology. If this is the case, by the time we
completed our project, it might not have any market value. However, we cannot migrate
to Ultra Wide Band technology at this point in time since the technology is not being
standardized and the only vendor that has a purchasable product is not following industry
standards. The same risk that we have in the first approach in the conversion from
Bluetooth to infrared signals holds for this approach, we cannot be sure that we will be
The third approach that we have in mind is to use PDA’s running Windows CE
(PocketPC) in place of the laptops or DSP. We would have two PDA’s in which one
would be a transmitter and the other the receiver. The sender would be responsible for
recognizing voice commands and send the command information via Bluetooth; just like
the previous two approaches, the receiver would be responsible for receiving the
Bluetooth signal and converting it to infrared signal. We think that the PDA approach
might have a higher level or risk and would cost more money to implement (since we do
not have PDA’s and the SDK would be costly). Nevertheless, this would be the approach
of choice; the size of the PDA’s would be the closest to a regular remote control. It would
be more challenging since programming in the PDA environment would be new to some
of us. It would also be more interesting to work with a platform that is not commonplace.
- 16 -
The biggest concern we have was to find a SDK for Windows CE. After more research,
however, we found that these SDK’s are available from a third party vendor such as IBM
at a relatively high cost. The problem then is whether we can afford the price of the SDK
which is needed to develop the application on the Windows CE platform. The other
concern that we have with this approach is the quality of the microphone that comes with
the PDA; voice recognition often do not work well with poor quality microphones and we
The cost calculated above is the approximate material cost of building the voice activated
control. It does not include the cost of labor. The estimated amount of time invested into
- 17 -
The group is composed of four members. Each member is expected to put in an
equal amount of time into the project. Each member put in approximately 57.75 man
hours into developing this product, costing about $1740. Assuming that each engineer is
paid approximately $30 per hour, the total labor cost for the project is $6930.
Research has taken about 96 hours or about 2880 dollars. The testing phase is
approximately 16 hours long, costing about 480 dollars. The group is expected to spend
61 hours on documentation.
In a real world engineering project there are more costs associated to the final project.
The cost of the support staff, rent, utilities, overhead, profits, traveling and
accommodation costs are just a few examples of what needs to be taken into account.
- 18 -
II.5. Project Schedule
ID Task Name Duration Start Finish
T
1
4 Superficial research on Ultra Wide Band 8 hrs Thu 9/25/03 Thu 9/25/03
5 We researched various DSP/microprocessors 8 hrs Thu 10/2/03 Thu 10/2/03
6 Find voice activation tools 11 days Tue 10/21/03 Tue 11/4/03
7 Research various voice recognition tools 24 hrs Tue 10/21/03 Thu 10/23/03
8 integrate the tools with software 15 hrs Tue 10/28/03 Wed 10/29/03
9 Troubleshoot 8 hrs Tue 11/4/03 Tue 11/4/03
10 Convert Voice activation to bluetooth 41.88 days Tue 9/16/03 Wed 11/12/03
11 Research conversion 26 days Tue 9/16/03 Tue 10/21/03
12 Research how bluetooth works with microcontro 8 hrs Tue 9/16/03 Tue 9/16/03
13 Research how bluetooth works with PDA 8 hrs Tue 9/23/03 Tue 9/23/03
14 Research how bluetooth works with laptop 8 hrs Tue 10/14/03 Tue 10/14/03
15 Wrap data into bluetooth frame 8 hrs Tue 10/21/03 Tue 10/21/03
16 Voice Bluetooth 15 hrs Tue 11/11/03 Wed 11/12/03
17 Send information from sender to reciever 6 days Tue 10/21/03 Tue 10/28/03
18 Research Sending protocols 8 hrs Tue 10/21/03 Tue 10/21/03
- 19 -
III. Conclusion
The aim of this project is to create a device and software which will allow a user
to simply speak the command they wish performed and the device the command is aimed
at will perform it. That is, the user will speak into our program, whether in a PDA or
another device, and the command will be transferred to a set-top device which will send
the command to the television, for example. Essentially, our device and software will
save people time and effort when doing a very mundane task, whether it be changing the
channel, the volume, or when pausing or forwarding a tape. While this may seem like a
novelty product, it can be very helpful for those who are constantly misplacing remote
controls or are too tired after coming home from a long day of work, for example. In the
future, our set-top device will be able to communicate with other Bluetooth devices that
Because of the work that has already been put in by the group and the fast-paced
schedule planned, our project will be completed well ahead of the final project deadline.
This will enable us to work on applying additional features to make the product more
attractive to our customers. By adding more services, such as support for devices beyond
televisions, VCR’s, and DVD players, to our device, we will be able to provide more uses
- 20 -
IV. References
http://www.bluetooth.com
http://www.dynamic-living.com/voice_activated_remote.htm
http://www.innotechsystems.com/voicefire.htm
http://www.itsc.state.md.us/ITSC/delvrbles/C-5-
1/C_5%20Speech%20Recognition%20in%20the%20SESA%20Call%20Center%
20-%20White%20Paper.pdf
http://www.laservision.co.uk/voiceme_remotecontrol.html
http://www.niad.sussex.ac.uk/ezine_issue.cfm?eZineID=5
http://www.sensoryinc.com/html/products/vetoolkit.html
http://www.smarthome.com/8167.html
http://www.techextreme.com/perl/story/15196.html
http://www.troygroup.com/wireless/products/wireless/docs/TROY%20TI%20DS
P%20Bluetooth.pdf
http://www.voicemethods.com/
Tutorials:
Bluetooth
http://www.tutorgig.com/searchtgig.jsp?query=bluetooth
Infrared
http://www.tutorgig.com/showurls.jsp?query=infrared
- 21 -