Sie sind auf Seite 1von 2

AT&T Developer Program

AT&T API Platform

Of Voice
What does the AT&T Speech API do?
What is the AT&T Speech API optimized to do?
The AT&T Speech API is optimized for the following contexts:
Web Search Speech to Text: Transcribes spoken search into text and returns the text to the app. The app can then perform a web search and return results to the user. Business Search Speech to Text: Transcribes spoken search into text and returns the text to the app. The app can then perform a business search and return results to the user. Voicemail to Text: Transcribes voicemail to text, so users can have the option of reading their messages instead of listening to them. SMS Speech to Text: Transcribes your users' spoken messages to text and returns the text to the app. The app can then populate and send text messages on behalf of your users.

The Power

Give Your Apps

Speech to Text

The AT&T Speech API is supported cross-carrier and is optimized to transcribe speech to text for numerous applications, making it easier for your customers to use voice to interact with your application. You send us audio. We send you text. Its that easy!

Question and Answer Transcription: Transcribes your users' spoken questions into text and returns the text to the app. The app can then perform a search on the question and return results to the user. TV Speech to Text: Transcribes spoken AT&T U-verseTM program guide information into text and returns the text to the app. Generic Speech to Text: Automatically detects English and Spanish language, and returns the appropriate text transcription.


AT&T Speech FAQ

What technology is used to power the AT&T Speech API?
Our Speech API is powered by the robust AT&T WATSON

Do you offer SDKs?

Coming soon, well offer SDKs that can help jump-start your application development process with native and web services SDKs, available at Use our native SDKs to create a thin client that is embedded in smartphone applications. Use our Sencha or Microsoft SDKs to create a UI for web services and HTML5-based applications. You also have the option to build your own UI and to use the speech enabler on any operating system, including web applications.

speech engine.

AT&T WATSON SM converts between different communication modalities, so people and devices can interact more readily. It consists of a general-purpose engine and a collection of plug-ins, each of which performs a conversion or analysis task. These tasks, many involving speech and language, can be combined in various ways, depending on what information is being communicated. AT&T WATSON SM is programmed to learn different accents, speaker variations, background environments, platform variations, dialects, and speech patternsand, thus, continually improve accuracy over time. AT&T has accumulated more than 600 patents on the technology, and developed a service that offers fast and accurate transcriptions for your customers. AT&T WATSON SM has been used within AT&T for interactive voice response (IVR) customersincluding AT&T's VoiceTone servicefor more than 20 years. Besides customer care IVR, AT&T WATSON has been used for speech analytics, speech translation (including the AT&T Translator app), mobile voice search of multimedia data, video search, voice remote, and voicemail to text. AT&T WATSON SM also supports Speak4it (local business search).

Which languages are supported?

At this time, we offer English support for our optimized speech operations. We also offer automatic English and Spanish detection with the Generic Speech to Text operation.

Will the Speech API transcribe text as well as human transcription services?
No. Technology has not yet reached the point that allows machines to process all the variability in how we speak to understand (and transcribe) speech as well as a person can. That said, we are constantly working to enhance the recognition of the Speech API with the intent that it will improve over time.

Does the Speech API only work on AT&T mobile devices?

We offer cross-carrier support by facilitating speech-to-text transcription for almost any mobile phoneeven across other U.S. wireless carrier services.

Do I need a speech specialist on staff to get the best use of the Speech API?
We designed our API to be truly plug and play. You connect to our API, and thats it. We maintain the platform, tune the speech libraries, and ensure that they remain current and state of the art.

How much will it cost once AT&T begins charging for the service in 2013?
Roughly, $0.01 per transaction. See tentative pricing details on the AT&T Developer Program site.

Which audio types are supported by the Speech API?

Both AMR and WAV audio formats are supported. The preferred format is AMR, which is the format generated by the SDK-supported UI.

Is there a limit to the amount of speech I can send to the Speech API?
The Voicemail to Text speech context will transcribe up to four minutes of audio per transaction. All other speech contexts will transcribe up to one minute of audio per transaction.

In what format is the text returned?

The text is returned in JSON format.