Ijetae Pooja

See
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/282268546
Text to Speech Conversion System using OCR
Article · January 2015
CITATIONS READS
0 2,453
1 author:
Aravind Sasikumar
Hochschule für angewandte Wissenschaften Kempten
15 PUBLICATIONS 6 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
LOSSLESS HIDING OF DATA IN IMAGES BY CONTRAST ENHANCEMENT View project
All content following this page was uploaded by Aravind Sasikumar on 29 September 2015.
The user has requested enhancement of the downloaded file.

International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 5, Issue 1, January 2015)
Text to Speech Conversion System using OCR

Jisha Gopinath1, Aravind S2, Pooja Chandran3, Saranya S S4
1,3,4
Student, 2Asst. Prof., Department of Electronics and Communication, SBCEW, Kerala, India
Abstract- - There are about 45 million blind people and 135 II. T EXT SYNTHESIS
million visually impaired people worldwide. Disability of
visual text reading has a huge impact on the quality of life for
Recognition of scanned document images using OCR is
visually disabled people. Although there have been several now generally considered to be a solved problem for some
devices designed for helping visually disabled to see objects scripts. Components of an OCR system consist of optical
using an alternating sense such as sound and touch, the scanning, binarization, segmentation, feature extraction and
development of text reading device is still at an early stage. recognition.
Existing systems for text recognition are typically limited
either by explicitly relying on specific shapes or colour masks
or by requiring user assistance or may be of high cost.
Therefore we need a low cost system that will be able to
automatically locate and read the text aloud to visually
impaired persons. The main idea of this project is to recognize
the text character and convert it into speech signal. The text
contained in the page is first pre-processed. The pre-
processing module prepares the text for recognition. Then the
text is segmented to separate the character from each other.
Segmentation is followed by extraction of letters and resizing
them and stores them in the text file. These processes are done
with the help of MATLAB. This text is then converted into
speech.
Index terms- Binarization, OCR, Segmentation, Templates,

TTS.
I. INTRODUCTION
Machine replication of human functions, like reading, is
an ancient dream. However, over the last five decades,
machine reading has grown from a dream to reality. Speech
is probably the most efficient medium for communication
between humans. Optical character recognition has become
one of the most successful applications of technology in the Fig 1:Components of an OCR-system.
field of pattern recognition and artificial intelligence.
Character recognition or optical character recognition With the help of a digital scanner the analog document is
(OCR), is the process of converting scanned images of digitizedand the extracted text will be pre-processed.Each
machine printed or handwritten text (numerals, letters, and symbol is extracted through a segmentation process [2].
symbols), into a computer format text. . Speech synthesis is The identity of each symbol comparing the extracted
the artificial synthesis of human speech [1]. A Text-To- features with descriptions of the symbol classes obtained
Speech (TTS) synthesizer is a computer-based system that through a previous learning phase. Contextual information
should be able to read any text aloud, whether it was is used to reconstruct the words and numbers of the original
directly introduced in the computer by an operator or text.
scanned and submitted to an Optical Character Recognition
(OCR) system [1]. Operational stages [2] of the system III. SPEECH SYNTHESIS
consist of image capture, image preprocessing, image Speech is the vocalization form of human
filtering, character recognition and text to speech communication. Speech communication is more effective
conversion.The software platforms used are MatLab, medium than text communication medium in many real
LabVIEW and android platform. world applications.
389
Speech synthesis is the artificial production of human In certain systems, this part includes the computation of
speech. A system used for this purpose is called a speech the target prosody (pitch contour, phoneme
synthesizer, and can be implemented in software or durations), which is then imposed on the output speech.
hardware. Synthesized speech can be created by
concatenating pieces of recorded speech that are stored in a V. SYSTEM IMPLEMENTATION
database. The quality of a speech synthesizer is judged by a) Using Lab VIEW
its similarity to the human voice, and by its ability to be
understood. LabVIEW is a graphical programming language that
uses icons instead of lines of text to create applications.
IV. T EXT T O SPEECH SYNTHESIS LabVIEW uses dataflow programming, where the flow of
data through the nodes on the block diagram determines the
A Text-To-Speech (TTS) synthesizer is a computer- execution order of the VIs and functions.VIs, or virtual
based system that should be able to read any text aloud. instruments, are LabVIEW programs that imitate physical
The block diagram given below explains the same [3]. instruments. In LabVIEW, user builds a user interface by
using a set of tools and objects. The user interface is known
as the front panel. User then adds code using graphical
representations of functions to control the front panel
objects. This graphical source code is also known as G
code or block diagram code.
i) LabVIEW Program Structure
A LabVIEW program is similar to a text-based program
with functions and subroutines; however, in appearance it
functions like a virtual instrument (VI) [5]. A real
instrument may accept an input, process on it and then
output a result. Similarly, a LabVIEW VI behaves in the
same manner. A LabVIEW VI has 3 main parts:
a) Front Panel window
Every user created VI has a front panel that contains the
graphical interface with which a user interacts. The front
panel can house various graphical objects ranging from
simple buttons to complex graphs [6].
Fig 2: Overall Block diagram b) Block Diagram window
A text-to-speech system (or "engine") is composed of Nearly every VI has a block diagram containing some
two parts: a front-end and a back-end. The front-end has kind of program logic that serves to modify data as it flows
two major tasks. First, it converts raw text containing from sources to sinks. The block diagram houses a pipeline
structure of sources, sinks, VIs, and structures wired
symbols like numbers and abbreviations into the equivalent
together in order to define this program logic. Most
of written-out words. This process is often called text
importantly, every data source and sink from the front
normalization, pre-processing, or tokenization [4]. The
panel has its analog source and sink on the block diagram.
front-end then assigns phonetic transcriptions to each This representation allows the input values from the user to
word, and divides and marks the text into prosodic units, be accessed from the block diagram. Likewise, new output
like phrases, clauses, and sentences. The process of values can be shown on the front panel by code executed in
assigning phonetic transcriptions to words is called text-to- the block diagram.
phoneme conversion. The back-end often referred to as
c) Controls, Functions and Tools Palette
the synthesizer—then converts the symbolic linguistic
representation into sound. Windows, which contain icons associated with extensive
libraries of software functions, subroutines, etc.
390
ii) Process Flowchart
A
Start
Create OCR session

Is Microsoft No
Win32 SAPI
Error
Available ?
Image capture
Yes
Read image and Make a server for

character set file Win32 SAPI
Get voce object from

Get ROI
Win32 SAPI
Read text Compare input string

with SAPI string
Draw bounding Extract voice

boxes
Wave player
Correlation
initialization
Recognize Output speech

character and write
to text file
Stop
Text analysis
b) Using Android
Android is a Linux-based operating system for mobile
devices such as smartphones and tablet computers. It is
developed by the Open Handset Alliance led by Google.
A
391
Google releases the Android code as open-source, under
the Apache License [7]. Android has seen a number of A
updates since its original release, each fixing bugs and
adding new features. Android consists of a kernel based on
the Linux kernel, with middleware, libraries and APIs
written in C and application software running on an Generate Bitmap in ARGB-
application framework which includes Java-compatible 8888
libraries based on Apache Harmony. Android uses the
Dalvik virtual machine with just-in-time compilation to run
Dalvik dex-code (Dalvik Executable), which is usually
translated from Java bytecode. The main hardware platform Pass image to Tesseract OCR
for Android is the ARM architecture. There is support for engine
x86 from the Android x86 project, and Google TV uses a
special x86 version of Android.
i) System design Display the text output given
Open source OCR software called Tesseract is used as a by OCR engine
basis for the implementation of text reading system for
visually disabled in Android platform. Google is currently
developing the project and sponsors the open development
project. Today, Tesseract is considered the most accurate
free OCR engine in existence. User can select an image Pass the text field to TTS
already stored on the Android device or use the device’s API
camera to capture a new image; it then runs through an
image rectification algorithm and passes the input image to
the Tesseract service.
When the OCR process is complete it produces a returns
Output speech
a string of text which is displayed on the user interface
screen, where the user is also allowed to edit the text then
using the TTS API enables our Android device to speak
text of different languages. The TTS engine that ships with
the Android platform supports a number of languages: Stop
English, French, German, Italian and Spanish. Also
American and British accents for English are both
supported. The TTS engine needs to know which language c) Using MATLAB
to speak. So the voice and dictionary are language-specific
resources that need to be loaded before the engine can start i) System architecture
to speak [8,9]. The system consists of a portable camera, a computing
device and a speaker or headphone. Images can be captured
ii) Process flowchart
using the camera. For better results we can use a camera
with zooming and auto focus capability. OCR based speech
Start synthesis system applications require a high processing
speed computer system to perform specified task. It's
possible to do with 100MHz and 16M RAM, but for fast
Image capture processing (large dictionaries, complex recognition
schemes, or high sample rates), we should shoot for a
minimum of a 400MHz and 128M RAM. Because of the
Correct orientation processing required, most software packages list their
minimum requirements. It requires an operating system and
sound must be installed in PC. System applications require
a good quality speaker to produce a good quality of sound.
A 392
ii) Process flowchart VI. RESULT
Text reading system has two main parts: image to text
Image Capture conversion and text to voice conversion. Image into text
and then that text into speech is converted using MatLab,
LabVIEW and in Android platform. For image to text
conversion firstly image is converted into gray image then
Image Preprocessing black and white image and then it is converted into text by
using MatLab and LabVIEW. But in Android platform we
processed rgb image. Microsoft Win 32 speech application
program interface library has been used to produce speech
Image Filtering Crop Lines information available for computer in MatLab and
Labview. This library allows selecting the voice and audio
device one would like to use. We can select the voices from
the list and can change the pace and volume, which can be
Resize Letters Crop Letters listened by installing wave player. Android platform
implementation uses Android text to speech application
program interface.
Load Templates Correlation a) Using LabVIEW

Input:
Write to Text File Extract Letters
Text Analysis
Input image
Output:
Is WIN32SAPI
available?
Make a server for Win32 SAPI
Get voice object from Win32 SAPI
Compare input string with SAPI string
Extract voice
Wave Player initialization Fig 3: Front panel of text reading system
Output Speech
393
b) Using Android Output:
Input:
Input image
Output text
VII. CONCLUSION
This paper is an effort to suggest an approach for image to
speech conversion using optical character recognition and
text to speech technology. The application developed is
user friendly, cost effective and applicable in the real time.
By this approach we can read text from a document, Web
Image capture page or e-Book and can generate synthesized speech
through a computer's speakers or phone’s speaker. The
Output: developed software has set all policies of the singles
corresponding to each and every alphabet, its pronunciation
methodology, the way it is used in grammar and dictionary.
This can save time by allowing the user to listen
background materials while performing other tasks. System
can also be used to make information browsing for people
who do not have the ability to read or write. This approach
can be used in part as well. If we want only to text
conversion then it is possible and if we want only text to
speech conversion then it is also possible easily. People
Output with poor vision or visual dyslexia or totally blindness can
c) Using MatLab use this approach for reading the documents and books.
People with speech loss or totally dumb person can utilize
Input: this approach to turn typed words into vocalization.
Experiments have been performed to test the text reading
system and good results have been achieved.
REFERENCES
[1] T. Dutoit, "High quality text-to-speech synthesis: a comparison of
four candidate algorithms," Acoustics, Speech, and Signal
Processing, 1994. ICASSP-94., 1994 IEEE International Conference
on, vol.i, no., pp.I/565-I/568 vol.1, 19-22 Apr 1994.
Input image [2] B.M. Sagar, Shobha G, R. P. Kumar, “OCR for printed Kannada text
to machine editable format using database approach” WSEAS
Transactions on Computers Volume 7, Pages 766-769, 6 June 2008.
394
[3] http://www.voicerss.org/tts/ Authors Biographies
[4] http://www.comsys.net/technology/speechframe/text-to-speech-
tts.html Jisha Gopinath, pursuing final year BTech
[5] Image Acquisition and Processing with LabVIEW, Christopher G degree in Electronics and Communication
Relf, CRC Press, 2004. Engineering from Mahatma Gandhi University,
[6] http://www.rspublication.com/ijst/aug%2013/6.pdf
Kerala, India. Completed Diploma in
Electronics Engineering from Technical Board
[7] Implementing Optical Character Recognition on the Android
Operating System for Business Cards Sonia Bhaskar, Nicholas
of Education, Kerala.
Lavassar, Scott Green EE 368 Digital Image Processing.
[8] J. Liang, et. al. “Geometric Rectification of Camera-captured Aravind S, Assistant professor in the
Document Images,” IEEE Transactions on Pattern Analysis and Department of Electronics and
Machine Intelligence, pp. 591-605, July 2006. Communication Engineering, Sree Buddha
[9] G. Zhu and D. Doermann. “Logo Matching for Document Image College of Engineering for Women, Mahatma
Retrieval,” International Conference on Document Analysis and Gandhi University, Kerala, India. He obtained
Recognition (ICDAR 2009), pp. 606-610, 2009.
M.Tech degree in VLSI and Embedded
Systems with Distiction from Govt. College of Engineering
Chengannur, Cochin University in 2012. He received his B.Tech
Degree in Electronics and Communication Engineering with
Distiction from the main campus of Cochin University of Science
and Technology, School of Engineering, Kerala, India, in 2009.
He has published ten research papers in various International
Journals. He has presented three papers in National Conferences.
He has excellent and consistent academic records,very good
verbal and written communication skills. He has guided nine
projects for graduate engineering students and one project for P.G
student. He has academic experince of 3 years and industrial
experience of 1.6 years. For Post Graduate students he has
handled subjects such as Electronic Design Automation Tools,
VLSI Circuit Design and Technology ,Designing with
Microcontrollers and Adaptive Signal Processing. He taught
subjects such as Network Theory, DSP, Embedded Systems,
Digital Electronics, Microcontroller and applications, Computer
Organisation and Architecture , Microprocessor and applications,
Microwave Engineering ,Computer Networks and VLSI for
B.Tech students.
Pooja Chandran, pursuing final year BTech
degree in Electronics and Communication
Engineering from Mahatma Gandhi University,
Kerala, India.
Saranya.S.S, pursuing final year BTech

degree in Electronics and Communication
Engineering from Mahatma Gandhi
University, Kerala, India. Completed Diploma
in Electronics Engineering from Technical
Board of Education, Kerala.
395
View publication stats

Ijetae Pooja

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Ijetae Pooja

Hochgeladen von

Copyright:

Verfügbare Formate

See

Text to Speech Conversion System using OCR

Article · January 2015

LOSSLESS HIDING OF DATA IN IMAGES BY CONTRAST ENHANCEMENT View project

The user has requested enhancement of the downloaded file.

Text to Speech Conversion System using OCR

Index terms- Binarization, OCR, Segmentation, Templates,

Create OCR session

Read image and Make a server for

Get voce object from

Read text Compare input string

Draw bounding Extract voice

Recognize Output speech

Load Templates Correlation a) Using LabVIEW

Write to Text File Extract Letters

Make a server for Win32 SAPI

Get voice object from Win32 SAPI

Compare input string with SAPI string

Wave Player initialization Fig 3: Front panel of text reading system

Saranya.S.S, pursuing final year BTech

View publication stats

Das könnte Ihnen auch gefallen