Beruflich Dokumente
Kultur Dokumente
April 2007
The problem wasnt the speed or the clarity of his voice. His evenlymeasured baritone was no more The reasons for wanting our computers rapidly spoken than the average to be able to understand our voices are persons voice, and his syllables were seemingly endless. Many of those clearly articulated. And Id like to reasons stem from the fact that all of us, think the problem wasnt my typing except maybe the very most skilled abilities. In high school I took a typing typists, can speak more quickly than we class, and my keyboard proficiency has can type. Nuance Technologies, a been shaped by years of instant company specializing in speech messaging and web surfing. The issue recognition, estimates in its marketing was the basic fact that our hands are a materials that most people speak more clumsy way to convert our thoughts than 120 words per minute, but type into a readable form. On the other fewer than 40 words per minute. hand, our voices I recently are like a wormhole learned just how Our voices are like a leap straight from realistic that number wormhole leap Star Trek, a direct is. I was trying to portal from our transcribe the straight from Star brains to the recording of an Trek, a direct portal outside world. interview I had just If a from our brains to conducted with Dr. computer could David McAllister, the outside world. have automatically Computer Science converted Dr. Professor at North McAllisters voice into text for me, the Carolina State University in Raleigh. process would have taken much less Dr. McAllister is, among many other time on my part. Looking at society as things, part of a research team doing a whole, similar scenarios are plentiful. work in computerized speech Transcriptions of medical and legal processing. When transcribing the information, for example, currently are interview, I found myself needing to very time-consuming, and can be made pause the recording every ten seconds much more efficient with the use of or so, sometimes rewinding to re-listen speech recognition. And time is to words Id missed. My fingers simply money, of course. could not keep up with the pace of his voice.
NaturallySpeaking 9 Standard
software regularly. He talks to his computer and has it do things for him, says McAllister. He uses it to create email and other messages, and stuff like that works very well. Its not always perfect, but its much better than you would think. His neighbor uses a standalone program called Dragon NaturallySpeaking, produced by Nuance Technologies. Its the worlds bestselling speech recognition for
April 2007
NaturallySpeaking Recorded...
Haircut or effect of takeover tactics Was offset from them, when I believe in mankind To be order not the: man is the question NaturallySpeaking is the greatest piece of software
I decided to try the program for myself. Luckily, the N.C. State librarys Assistive Technologies Center had a copy of the program available for me to try out. Getting started with the program was a very simple process: I just put on a headset with a microphone attached, opened the program, and started talking. There is an option to set up a new profile and train the program to understand your voice, a process that takes roughly 30 minutes depending on how thorough
As the figure above shows, the results of my trial were decidedly mixed. I measured my average voice dictation speed to be roughly 200 words per minute (I average about 60 when typing), but I cant say the improved speed fully made up for the errors. To be fair, the examples I chose are some of the worst. Realistically, the dictation averaged about one or two errors per sentence. And I could see a moderate amount of improvement as my trial progressed: I was learning
how to use the program (using keywords to dictate commas and periods, for example) and as I corrected its errors, it was beginning to train itself to my voice. Its probably safe to say the results would have been much more agreeable if Id trained the program for period of days or weeks, just as any serious user of the program would. (McAllisters neighbor had done this, of course.) Another feature of NaturallySpeaking is the ability to control the mouse by voice. This is accomplished by something called the Mousegrid, which divides the screen into increasingly small numbered rectangles and moves the mouse into the rectangle you command it to. The figure to the left demonstrates how I used the Mousegrid to close a browser window. It was easy enough to use, and for someone who cant use a mouse it would be an essential feature. However, it takes the computer a moment to render each grid onto the screen, and it was necessary to pause a bit between words. It took a total of approximately 5 seconds for me to close the window. This may not sound very long, but closing a window using the mouse itself takes under a second. NaturallySpeaking is the most widely used standalone speech recognition program, but many personal computers are sold with a
April 2007
RESEARCHING SPEECH
I spoke with Dr. McAllister to learn more about the science behind speech processing and whats holding it back from working perfectly. McAllisters research career was already well underway when he
which had not been applied to tell what a person was saying. These complex methods were used to process speech signals and produce a computer animation of them being spoken. Such a method was of interest to video game and movie animation companies, for example. New to the area of signal processing at the time, McAllister played the role of graduate student for awhile. After that, McAllister and his research partners realized their new signal processing techniques could be used for an entirely different type of speech processing, called speaker recognition. Unlike speech recognition, which seeks to identify the words being spoken, speaker recognition is concerned with identifying the speaker. Many of the underlying problems are shared between the two areas, but the majority of McAllisters speech processing experience is in speaker recognition. There are many uses for speaker recognition technology, including criminal justice and security.
The plots above are from a 2002 paper written by McAllister and four colleagues at N.C. State. It uses a complex mathematical technique to model the speakers voice in two dimensions, as shown on the plots. Even without understanding exactly what the plots mean, its easy to see that the two left plots are much more similar than the other two, because theyre the same speaker. Much of the research being done in speaker recognition deals with criminal justice, and is being subsidized by the government. It is of interest for the FBI, for instance, to be able to identify people who have issued bomb threats over the telephone, says McAllister, and lawyers would like to be able to establish that either a person did or didnt say certain things on the telephone. In cases in which its known for a fact that the speaker is a member of a given group of people called a closed set problemthe speaker can be chosen at a forensic quality of 95% or more, given enough
April 2007
Fortunately, it should be a while before we start running into any Klingons or Ferengis.
WHAT TO EXPECT
Its clear that some uses of speech recognition are more realistic in the near future than others. We probably can expect more speech systems that help make our lives more convenient, as in the case of hands free computer use. Its been demonstrated that under the right conditions, that sort of thing can be done at a high level of reliability.
But until that reliability goes from high to perfect, we cant expect to see things that rely on speech processingonly ones that use it as a supplement. Imagine if voice was used to log into your computer instead of a password. What if you had a sore throat and couldnt log in at all? Its safe to say well all own keyboards for the foreseeable future, even if we might not be typing on them quite as often. If the Universal Translator only worked 90% (or even 99%) of the time, he Star Trek shows would be more dramatic, to say the least. Its probably safe to say at least a few intergalactic wars wouldve been caused when a word or two got misinterpreted. Fortunately, it should be a while before we start running into Klingons or Ferengis, and theres plenty of time to get our Universal Translators ready for that day.
10