Sie sind auf Seite 1von 6

Akira - A Perfect Human Companion

Abhay Pratap Singh, Abhinav Dwivedi, Ankita Verma, Avinash Kumar, Om Krishna Gupta
abhay000singh@gmail.com, dwivedi.abhinav006@gmail.com, ankita214.verma@gmail.com, avinashkumar18022@gmail.com,
om_0502@yahoo.co.in

Department of Electronics & Communication Engineering, Ajay Kumar Garg Engineering College, Ghaziabad
ABSTRACT- The immense growth in the A. USER AUTHENTICATION
technology has led to the rapid digitization and
 Face Recognition: This is a technology capable of
development over the past few decades. There identifying or verifying a person from a digital
lies a need of a perfect companion that would image. Face Recognition technique uses 3D
not only help with daily stuffs but will also ease sensors to capture information about the shape of a
out life and here Akira comes into picture. face. This information is then used to identify
distinctive features on the surface of a face, such as
the contour of the eye sockets, nose, and chin.

I. INTRODUCTION  Face recognition algorithms identify facial features


Alexa and Google Home do implement some features to by extracting landmarks, or features, from an
encompass the need of a perfect human companion but still image of the subject's face. For example, an
there lies a need of a virtual assistant that encircles personal algorithm may analyze the relative position, size or
needs in all domains and here comes Akira in action. shape of the eyes, nose, cheekbones, and jaw.
Akira could manage all the daily stuffs by just a single voice These features are then used to search for other
command. Akira memorizes your daily schedule and gives images with matching features.
suggestions to improve lifestyle and stay connected with
your loved ones. It also improves your lifestyle by  Speaker Recognition: Speaker recognition is
monitoring activities such as sleep tracker and allows you to implemented using Machine learning. Firstly
control entire household objects such as coffee machine GMM model of the user is prepared using python
through voice commands. libraries and test audio data. During Identification
II. COMPONENTS newly generated models is compared with the
previously saved models and thus user is
A. USER AUTHENTICATION authenticated.
 Face Recognition: A facial recognition system is a
technology capable of identifying or verifying a
person from a digital image.

 Speaker Recognition: Speaker recognition is the


identification of a person from characteristics of
voices

B. SPEECH PROCESSING
 Natural language processing (NLP) is the ability of
a computer program to understand human language
as it is spoken. NLP is a component of artificial
intelligence (AI)

 NLU enables computers to understand commands


Fig1: MFCC feature extraction process
without the formalized syntax of
computer languages and for computers to
communicate back to humans in their
own languages. B. SPEECH PROCESSING
 It is the inter-disciplinary sub-field of
C. Iot HOME CONTROL computational linguistics that develops
Entire household devices are connected to Akira through methodologies and technologies that enables
Internet of Things and thus electronic devices can be the recognition and translation of spoken language
controlled through voice commands. into text by computers.

 The audio is first converted into text using google


D. TEXT TO SPEECH CONVERSION cloud api. The text generated is then used to identify
The desired response is conveyed to the user through text to the action to be performed using natural language
speech conversion using various speech engines. processing implemented using python. The desired
task is then performed

III. WORKING C. Iot HOME CONTROL


An IoT system consists of sensors/devices which “talk”
to the cloud through some kind of connectivity. Once
the data gets to the cloud, software processes it and
then might decide to perform an action, such as
sending an alert or automatically adjusting the  Sensors
sensors/devices without the need for the user.
D. TEXT TO SPEECH CONVERSION B) SOFTWARE SPECIFICATIONS
 Python 3.6
 TensorFlow
 TKinter
 Numpy
 Pandas
 Pyaudio
 Audiooy
 SpeechRecognition
Fig2: Text to speech conversion process  Pyttsx
 Google Cloud Speech To Text

VI. COMMERCIAL APPLICATION AND FUTURE SCOPE


Various speech engines can be used to convert text to speech
including google speech engine, python module pyttsx and A. COMMERCIAL APPLICATIONS
many more.  As a home assistant
 Security
 Life style monitoring.
 Entertainment
 Sentiment Analysis
IV. BLOCK DIAGRAM  Calling Feature
 Intruder Detection

B. FUTURE SCOPE
 Holographic Communication
 Access to mails
 Provide navigation and assistance
 Can read books

VII. CONCLUSION

Akira can serve not only as a perfect home assistant but also
as a secured system with intrusion detection capability
through single voice command.
Akira would serve as a boon for old age people and can be
used in diverse fields.
Thus it can be concluded that Akira is the necessity of the
hour and has huge potential market with social utility.

V. HARDWARE AND SOFTWARE SPECIFICATIONS

A) HARDWARE SPECIFICATIONS
 Camera
 Speaker
 Raspberry pi
 Jumpers
 Microphone ACKNOWLEDGMENT
 LCD
[3] R. Arandjelovic and A. Zisserman. All about VLAD. In proc.
CVPR,2013
We take this opportunity to express our deep sense of
[4] Deep face recognition paper.
gratitude and regard to Mr. Om Krishna Gupta, Asst. Prof.
[5] https://www.raspberrypi.org
(ECE Deptt.), Ajay Kumar Garg Engineering College,
[6] Raspberry pi 3 Home Automation projects by Shantanu Bhadoria and
Ghaziabad for his continuous encouragement and able Ruben Oliva Ramos
guidance, we needed to complete this project. [7] Introduction to Natural Language Processing by Michael Walker
We would pay our sincere gratitude to the Head of the [8] Deep Learning fundamentals by Chao Pan
Deptt. (ECE & EI), Prof. P.K. Chopra for his precious and
enlightening words of wisdom which motivated us
throughout our project work.

REFERENCES

[1] A. Anjos et al. “Bob: a free signal processing and machine learning
toolbox for researchers”. In: 20th ACM Conference on Multimedia
Systems (ACMMM), Nara, Japan. ACM Press, Oct. 2012. url:
http://publications.idiap.ch/downloads/papers/2012/Anjos_Bob_ACM
MM12.pdf.
[2] David Arthur and Sergei Vassilvitskii. “k-means++: The advantages
of careful seeding”. In: Proceedings of the eighteenth annual ACM-
SIAM symposium on Discrete algorithms. Society for Industrial and
Applied Mathematics. 2007, pp. 1027–1035.