Sie sind auf Seite 1von 7

Abstract

Who doesn't want to have the luxury to own an assistant who always listens for your call,
anticipates your every need, and takes action when necessary? That luxury is now available
thanks to artificial intelligence-based Desktop Assistants.

Desktop Assistants come in somewhat small packages and can perform a variety of actions
after hearing your command. They can turn on lights, answer questions, play music, place
online orders and do all kinds of AI-based stuff.

Desktop Assistants are not to be confused with virtual assistants, which are people who work
remotely and can, therefore, handle all kinds of tasks. Rather, Desktop Assistants are
technology based. As Desktop Assistants become more robust, their utility in both the
personal and business realms will grow as well.

“ it isn’t surprising to hear someone speak to someone that isn’t there. We ask Alexa for the
weather and to turn the temperature down on the thermostat. Then, we ask Siri what our
schedule for the day is and to call people. We are connected now more than ever using our
voice and voice interface technology. I can’t imagine doing things manually anymore! It’s
truly the future.”

INTRODUCTION

1. In recent times, Desktop Assistants got the major platform after Apple integrated the
most astonishing Virtual Assistant — Siri which is officially a part of Apple Inc. But
the timeline of greatest evolution began with the year 1962 event at the Seattle World
Fair where IBM displayed a unique apparatus called Shoebox. It was the actual size of
a shoebox and could perform scientific functions and can perceive 16 words and also
speak them in the human recognizable voice with 0 to 9 numerical digits.
2. During the period of the 1970s, researchers at Carnegie Mellon University in
Pittsburgh, Pennsylvania — with the considerable help of the U.S Department of
Defence and its Defence Advanced Research Projects Agency (DARPA) — made
Harpy. It could understand almost 1,000 words, which is approximately the vocabulary
of a three-year-old child.
3. Big organizations like Apple and IBM sooner in the 90s started to make things that
utilized voice acknowledgment. In 1993, Macintosh began to building speech
recognition with its Macintosh PCs with PlainTalk.
4. In April 1997, Dragon NaturallySpeaking was the first constant dictation product
which could comprehend around 100 words and transform it into readable content.
A Desktop Assistant or intelligent personal assistant is a software agent that can perform
tasks or services for an individual based on verbal commands i.e. by interpreting human
speech and respond via synthesized voices. Users can ask their assistants’ questions, control
home automation devices, and media playback via voice, and manage other basic tasks such as
email, to-do lists, open or close any application etc with verbal commands.

Let me give you the example of Braina (Brain Artificial) which is an intelligent personal
assistant, human language interface, automation and voice recognition software for Windows
PC. Braina is a multi-functional AI software that allows you to interact with your computer
using voice commands in most of the languages of the world. Braina also allows you to
accurately convert speech to text in over 100 different languages of the world.

2. PROBLEM DOMAIN

Since Amazon Echo shipped in late 2014, smart speakers and voice assistants have been
advertised as the next big thing. Nearly four years later, despite the millions of devices sold,
it’s clear that like many other visions of the tech industry, that perception was an
overstatement. Testament to the fact: Most people aren’t using Alexa to make purchases, one
of the main advertised use cases of Amazon’s AI-powered voice assistant.

Voice assistants have existed before the Echo. Apple released Siri in 2011 for iOS devices.
But Echo was the first device where voice was the only user input medium. And the years
have made the limits of voice more prominent.

To be clear, voice assistants are very useful and their application will continue to expand and
become integrated into an increasing number of domains in our daily lives, but not in the
omnipresent way that an AI assistant implies. The future of voice is the integration of artificial
intelligence in plenty of narrow settings and tasks instead of a broad, general purpose AI
assistant that can fulfill anything and everything you can think of.

The problem with integrating too many commands into smart speakers

Voice recognition is a relatively narrow field. This means given enough samples, you can
create a model that can recognize and transcribe voice commands under different
circumstances and with different background noises and accents.

To make a desktop assistant that can be operational in multiple languages

1. Open the subreddit in the browser.


2. Open any website in the browser.
3. Send an email to your contacts.
4. Launch any system application.
5. Tells you the current weather and temperature of almost any city
6. Tells you the current time.
7. Greetings
8. Play you a song on VLC media player(of course you need to have VLC media
player installed in your laptop/desktop)
9. Change desktop wallpaper.
10. Tells you latest news feeds.
11. Tells you about almost anything you ask.

BUDGET DATA

3. SOLUTION DOMAIN

Architecture

Inputs:
It takes explicit user inputs thru voice interface. This input is combined with contextual information
such as location, time, other task context etc., and fed into its system to identify the task that needs
to be executed.

NLP Modules:

NLP modules include text parsers, vocabulary for the agent, language interpreter. This module is
responsible for converting voice input into text.

Memory:

It maintains short term as well as long term memory to provide intelligent and personalized task
execution to the user. Short-term memory aids the agent to cache answers to related questions and
avoid asking the same question multiple times to the user, thus optimizing the performance of the
agent. It can use this short-term memory to store output of tasks to represent new state of the
environment as well as contextual information of the user. While long-term memory is used to store
user’s interests, and patterns in answers that can help agent to predict some of the choices user may
make and thus focus on these patterns while querying the services. Web Service Integration SIRI
depends on external data and web service providers to gather information on specific services
available at a location as well as for generic search capabilities. Following are the set of web services
it uses for different domains of questions posed by the user:

 Restaurant and businesses: OpenTable, Gayot, CitySearch, BooRah, Yelp, Yahoo Local,
ReserveTravel, Localeze

 Events & Tickets: Eventful, StubHub, and LiveKick

 Movies and tickets: MovieTickets, Rotten Tomatoes, and the New York Times

 Factual Questions: Bing Answers, Wolfram Alpha, Evi and Wikipedia

 Web Search: Bing, Yahoo, and Google, Wikipedia for web search.

 Maps: Google Maps and Yelp! Search

 iOS Applications: contacts, calendars, clock, reminders, browser, phone call, SMS

Web services provide flexibility as well as modularity to SIRI, as specialized tasks can be offloaded to
these web services while focusing on the user interface and integration with these APIs. Set of web
services used by SIRI is hard coded into the task models, and so is known at design time as well as
run time. This set of web services is updated in order to integrate with any web service supporting
the existing tasks or new tasks defined by SIRI’s task models.

Dialog Flow Processor:

Dialog flow processor embeds itself within each of the user inputs and disambiguation phases while
SIRI is trying to map user request to web service calls to external applications or internal iOS
applications. This module enables the agent 15 to keep the context of the question in mind, and ask
the user questions in order to gather any incomplete information to complete the task.

Output:

task models as sequence of steps that involve calls to web services to gather information and then
having a dialog with the user to complete this task. Once SIRI identifies the task to be performed,
output of the task could be in the form of information gathered by calling a web service or
automating a task such as setting up a reminder or creating a new meeting invite. SIRI then maps
this output to user consumable form

Above Figure 2-3 shows different models within SIRI that are designed to model the specific task
domain, dialog flow, task flow, entities for the task and integration with web services. Dialog Flow
models shown in the figure 2-3 help the agent to have a conversational dialog with the user and
collect information regarding the task to be performed. Any 17 ambiguities in user input are handled
thru dialog. This helps the agent to collect user input in parts, and build on already collected
information.

Task Flow models define the workflow for an identified task. These models define set of
dependencies for the task, preconditions, task decomposition, and the post conditions.
Specializations of Task model implement specific task such as Dining out domain model shown in the
figure 2-3. This model enables the agent to plan for the task as well as collect all the relevant
information for a given sub task at the time it’s required. These models explicitly bind with the web
services that can be used to complete the task. This binding of web service is done at the design
time, so the agent at execution time knows before hand on set of web services it will be interacting
with for a particular task.

4. SYSTEM DOMAIN
Technology used: -

- Google Language API

- Microsoft in-built voices

- Speech Recognition Module

- Wikipedia Module

Programming language:

- Python

Tools: -

• Visual Studio Code

• Jupyter notebook

• Spyder

Minimum system requirement:-

• Processors: Intel Atom® processor or Intel®Core™ i3 processor

• Disk space: 5 GB

• Operating systems: Windows* 7 or later

5. APPLICATION DOMAIN

Artificial Intelligence personal assistants have become plentiful over the last few years.
Applications such as Siri, Bixby, Ok Google and Cortana make mobile device users’ daily
routines that much easier. You may be asking yourself how these function. Well, the assistants
receive external data (such as movement, voice, light, GPS readings, visually defined markers,
etc.) via the hardware’s sensors for further processing - and take it from there to function
accordingly.

This system used in following domains

• Forecasting

• Operation of laptop by commands

• Business growth

• Usable in multiple languages

• Business Intelligence
5. EXPECTED OUTCOME

Invoking iOS applications based on user request

 Call someone from my contacts list


 Launch an application on my desktop
 Send a text message to someone
 Set up a meeting on my calendar for 9am tomorrow
 Set an alarm for 5am tomorrow morning
 Play a specific song in my iTunes library
 Enter a new note

Calling external services to serve user requests

 Search for text (on search engines – Google, Bing, Wikipedia, etc.)
 Search for a concept (in Wolfram Alpha)
 Get directions from my current location to home
 Tweet a message
 Post a message or photo to Facebook
 Check weather today at a location
 What movies are playing at AMC theater in Cambridge
 Get the latest score for Red sox game today
 Book a table for two at a restaurant in Boston

7. REFERENCES

 https://towardsdatascience.com/build-your-first-voice-assistant-85a5a49f6cc1 for above


content.

Das könnte Ihnen auch gefallen