Beruflich Dokumente
Kultur Dokumente
Volume: 4 Issue: 3
ISSN: 2321-8169
511 - 516
_______________________________________________________________________________________
Text Extraction from Captured Image and Conversion to Audio for Smart Phone
Application
Sneha V. Deshmukh1, Jaishree M. Kamble1, Abhilasha B. Kharate1, Pooja G. Deo1,
Supriya A. Khadasne1 , Prof. V.B.Gadicha2
1
Students of Computer Science & Engineering Department, P.R. Pote COE Amravati University, India
HOD of Computer Science & Engineering Department, P.R. Pote COE Amravati University, India
Email: snehadeshmukh33@gmail.com, jayashrikamble400@gmail.com, abhilashakharate@gmail.com
poojadeo18@gmail.com , supriya.khadasne@gmail.com , headcse1108@gamil.com
2
Abstract Text extraction from captured image by smart phone is difficult task due muddle background and non-textual portion. Again the text is
in a variety of fonts, styles, sizes, and having different words where every word may contain different characters in dissimilarities of text patterns. If
we can ignored the problems of muddle background and text separation for the some instant, again there are several other reasons as font style and
variations in size word by word or character by character; background as well as foreground colour; camera position which can lead distortions;
brightness and image resolution. The proposed technique is firstly, Capture the image from mobile camera and it is a color image. Then the colour
image is converted into gray scale image and then gray scale image is converted into binary image. This binary image is gives to the Optical character
recognition (OCR) engine which recognize and extract the text from image and gives to the Text to Speech (TTS) engine. The Text to Speech engine
is converting the text into audio.
Keywords-- Text Extraction, OCR engine, Text to Speech engine, Smart Phone Application
____________________________________________________*****_________________________________________________
I. INTRODUCTION
Extracting text from captured images or videos is an
important problem in many applications like document
processing, image indexing, video content summary, video
retrieval, video understanding. Capturing images and videos, text
characters and strings usually appear in nearby sign boards and
hand-held objects and provide significant knowledge of
surrounding environment and objects. Capturing images usually
suffer from low resolution and low quality, perspective distortion
and complex background. [1]
Text on captured images is hard to detect, extract and
recognize since it can appear with any slant, tilt, in any lighting,
upon any surface and may be partially occluded. Many
approaches for text detection from capturing images have been
proposed recently. To extract text information by mobile devices
from captured image, automatic and efficient scene text detection
algorithm is essential. A character descriptor is proposed extract
representative and discriminative features from character patches.
It combines several feature detectors (Optimal Character
Recognition (OCR) [1]. The main focus of our project is that
visually challenged person can get text information which is
present in the text boards, instructions on traffic sign boards and
hoardings in audio form. With this point of view, the application
design for a camera based reading system that remove text from
text board and identify the text characters and strings from the
captured image and finally, text will be converted into audio.
This system allows the user to photograph a text image such as
Stop sign board on road side and click on speak button and hear
the text aloud and gives real time feedback. Extracting text
information from captured image is difficult task due clutter
background and non-text outliers, and further, text consists of
dissimilar words where every word may contain different
characters in a variety of fonts, styles, and sizes, resulting in large
intra-variations of text patterns. Even if the problems of clutter
background and text segmentation were to be ignored for the
moment, there are several other reasons such as font style and
thickness; background as well as foreground color and texture;
camera position which can introduce geometric distortions;
illumination and image resolution. Optical character recognition
is the electronic conversion of photographed images of printed
text into computer-readable text. A text-to-speech system
converts normal language text into speech. It is usually meant to
help visually challenged people and other people also. [2]
II. LITERATURE SURVEY
Several methods that can be used for extracting text from images
such as document images, scene images etc. Texts that are
present in an image contain several useful and important
information. In this paper, they employ discrete wavelet
transform (DWT) to extract the text from an image. The image
that will be passing as an input can be a color image or it can be
gray-scale image. If the image is a color image, then
preprocessing is to be done on an image. In order to extract the
text edges from an image, so label edge detector is used on each
sub image.
511
__________________________________________________________________________________________
ISSN: 2321-8169
511 - 516
_______________________________________________________________________________________
__________________________________________________________________________________________
ISSN: 2321-8169
511 - 516
_______________________________________________________________________________________
__________________________________________________________________________________________
ISSN: 2321-8169
511 - 516
_______________________________________________________________________________________
i.
ii.
Image I/O:
Some facilities have been provided for image input and
output. This is of course required to build executable that handle
images, and many examples of such programs, most of which are
for testing, can be built in the prog directory. Functions have
been provided to allow reading and writing of files in JPEG,
PNG, TIFF, BMP, PNM, GIF, WEBP and JP2 formats.
iii. Tesseract API:
Tesseract is written in the C++ programming language, it is
no trivial task to use it on the Java-based Android OS. The C++
code needs to be wrapped in a Java class and run natively via the
Java Native Interface (JNI). Though there is some effort
involved, one great benefit to running Tesseract natively is that
C++ is substantially faster than Java. Tesseract-OCR uses liblept
mainly for image I/O, but you can also use any of Leptonicas
many image processing functions on PIX, while at the same time
calling TessBaseAPI methods.
Architecture of Tesseract:
Tesseract works with independently developed Page Layout
Analysis Technology. Hence Tesseract accepts input image as a
binary image. Tesseract can handle both, the traditional- Black on
White text and vise versa. Outlines of component are stored on
connected Component Analysis. Nesting of outlines is done
which gathers the outlines together to form a Blob. Such Blobs
are organized into text lines. Text lines are analyzed for fixed
pitch and proportional text. Then the lines are broken into words
by analysis according to the character spacing. Fixed pitch is
chopped incharacter cells and proportional text is broken into
words by definite spaces and fuzzy spaces. Tesseract performs
activity to recognize words. This recognition activity is mainly
consists of two passes. The first pass tries to recognize the words.
Then satisfactory word is passed to Adaptive Classifier as
training data, which recognizes the text more accurately. During
second pass, the words which were not recognized well in first
pass are recognized again through run over the page. Finally
Tesseract resolves fuzzy spaces. To locate small and capital text
Tesseract checks alternative hypothesis for x-height.
514
IJRITCC | March 2016, Available @ http://www.ijritcc.org
__________________________________________________________________________________________
ISSN: 2321-8169
511 - 516
_______________________________________________________________________________________
d) Acoustic Processing:
It performs formant synthesis. It works intelligently and thus
does not require any kind of database of speech samples. For
speak out the text, it uses voice characteristics of a person.
IV. SYSTEM IMPLEMENTATION
The system implementation is one of the important stage of
software design, in system implementation we actually
implements optimal character recognition (OCR) algorithm,
OTSU algorithm for text recognition and extraction. To do actual
implementation for this application we take support of
__________________________________________________________________________________________
ISSN: 2321-8169
511 - 516
_______________________________________________________________________________________
c) Crop an image as per our requirement and save it:
VII. REFERENCES
[1]
[2]
[3]
[4]
[5]
[8]
[9]
[10]
[11]
__________________________________________________________________________________________