You are on page 1of 10

INTERNATIONAL JOURNAL OF GRAPHICS AND

International Journal of Graphics and Multimedia (IJGM), ISSN 0976 – 6448(Print),

ISSN 0976 – 6456(Online) Volume 4, Issue 1, January - April 2013, © IAEME

MULTIMEDIA (IJGM)

ISSN 0976 - 6448 (Print) ISSN 0976 -6456 (Online) Volume 4, Issue 1, January - April 2013, pp. 31-40 © IAEME: www.iaeme.com/ijgm.asp Journal Impact Factor (2013): 4.1089 (Calculated by GISI) www.jifactor.com

IJGM

© I A E M E

INTERNATIONAL JOURNAL OF GRAPHICS AND International Journal of Graphics and Multimedia (IJGM), ISSN 0976 – 6448(Print),

SCRIPT IDENTIFICATION USING DCT COEFFICIENTS

M. M. Kodabagi 1 , Hemavati C. Purad 2

  • 1 Department of Computer Science and Engineering, Basaveshwar Engineering College, Bagalkot-587102, Karnataka, India

  • 2 Department of Computer Science and Engineering, Tontadarya College of Engineering, Gadag-582101, Karnataka, India

ABSTRACT

Automated systems for understanding low resolution images of display boards are facilitating several new applications such as blind assistants, tour guide systems, location aware systems and many more. Script identification at word level is one of the very important pre-processing steps for development of such systems prior to further image analysis. In this paper, a new approach for word level script identification of text in low resolution images of display boards is presented. The proposed methodology uses horizontal run statistics and texture features for distinguishing 3 Indian scripts namely; Hindi, Kannada and English. The method computes discrete cosine transform based texture features from input word image and uses newly defined threshold based discriminant function to identify the script class. The methodology is evaluated on 800 low resolution word images of display boards. The proposed method is robust and insensitive to the variations in size and style of font, number of characters, thickness and spacing between characters, noise, and other degradations. The proposed method achieves an overall identification accuracy of 85.44% and individual identification accuracy of 100% for Hindi Script, 70.33% for Kannada Script and 86% for English.

1. INTRODUCTION

In recent years, the camera embedded hand held systems such as smart mobile phones, tablets and PDA’s are being widely used and they increasingly exhibit higher computing and communication capabilities. These devices with internet access facilities are being used for wide variety of purposes such as information seeking, mobile commerce and other business and enterprise applications. One such application is to understand written text

31

International Journal of Graphics and Multimedia (IJGM), ISSN 0976 – 6448(Print), ISSN 0976 – 6456(Online) Volume 4, Issue 1, January - April 2013, © IAEME

on display boards in an unknown environment. People who move across different places in the world for field work and business find it difficult to understand written text on display boards particularly in foreign environment. This is especially true in countries like India, which are multilingual. Hence there is a need for a gadget that helps people to understand display boards by detecting and translating written matter while providing localized information. The written matter on display boards/name boards provides important information for the needs and safety of people, and may be written in unknown languages. The written matter can be street names, restaurant names, building names, company names, traffic directions, warning signs etc. Researchers have focused their attention on development of techniques for understanding written text on such display boards. There is a spurt of activity in the development of web based intelligent hand held systems for such applications. In the reported works [1-10] on intelligent systems for hand held devices, not many works pertain to understanding of written text on display boards. Therefore, scope exists for exploring such possibilities. The text understanding involves several processing steps; text detection and extraction, preprocessing for line, word and character separation, script identification, text recognition and language translation. In the Indian context, the written text on display board may contain multilingual information. Therefore, recognition and language translation tasks require script identification at word level. Hence, script identification at word level is one of the very important processing steps for development of such systems prior to further analysis. The script identification of text in low resolution images of display boards is a difficult and challenging problem due to various issues such as font size, style, and spacing between characters, skew and other degradations. The reported works on script identification employ a number of different approaches, which are categorized into local and global methods. The local approaches use connected component analysis process for determining the script of text. In contrast, the global approaches measure the properties of a region/block of text and give sufficient characterization of the underlying script. Hence global approaches, such as texture analysis is a good choice for solving such a problem. The task of script identification of text in low resolution image of display board is an important step whose output will be used by the later processing steps of display board understanding system. In this paper, a new approach for word level script identification of text in low resolution images of display boards is presented. The proposed methodology uses horizontal run statistics and texture features for distinguishing 3 Indian scripts namely; Hindi, Kannada and English. The method computes discrete cosine transform (DCT) based texture features from input word image and uses newly defined threshold based discriminant function to identify the script class. The proposed method is robust and insensitive to the variations in size and style of font, number of characters, thickness and spacing between characters, noise, and other degradations. The proposed method achieves an overall identification accuracy of 85.44% and individual identification accuracy of 100% for Hindi Script, 70.33% for Kannada Script and 86% for English Script. The rest of the paper is organized as follows; the detailed survey related to script identification from images is described in Section 2. The proposed method is presented in Section 3. The experimental results and discussions are given in Section 4. Section 5 concludes the work and lists future directions of the work.

32

International Journal of Graphics and Multimedia (IJGM), ISSN 0976 – 6448(Print), ISSN 0976 – 6456(Online) Volume 4, Issue 1, January - April 2013, © IAEME

2. RELATED WORKS

A substantial amount of work has gone into the research related to script identification from printed document images. Some of the related works are summarized in the following. The script identification of low resolution image of display board is a necessary step for development of various other tasks of display board understanding system. A number of methods for script identification have been published in recent years and are categorized into local and global approaches. The local approaches perform connected component analysis and use statistic based features for script identification. Few such methods are summarized in the following; An approach for determining the script and language of document images is proposed in [11]. Initially, the algorithm determines connected components and locates upward concavities in the connected components. It then classifies the script into two broad classes Han-based (Chinese, Japanese and Korean) and Latin-based (English, French, German and Russian) languages. The Han-based languages are later differentiated using statistics of optical densities of connected components. And Latin-based languages are identified based on the most frequently occurring word shapes characteristics. An automatic technique for the identification of printed Roman, Chinese, Arabic, devnagari und Bangla text lines from single document image is found in [12]. The method uses headline feature to separate Devanagari and Bangla script line into one group and other script lines (English, Chinese and Arabic) are separated into other group. The technique obtains zone wise features to identify Devanagari and Bangla scripts. Further, vertical run length statistics and water reservoir features are used to classify Chinese, English and Arabic scripts. The experimental results were conducted on 25000 text lines and identification rates of 97.32%, 98.65%, 97.53%, 96.02% and 97.12% for English, Chinese, Arabic, Devnagari and Bangla scripts respectively are reported. However, the approach reports higher error rates for short text lines containing a word with few characters. The method for script and language identification of noisy and degraded document images is employed in [13]. The method identifies script based on document vectorization technique that converts each image into vertical cut vector and character extremum points that characterizes the shape and frequency of contained character or word images. The method is tolerant to the variation in text fonts and styles, noise, and various types of document degradation. For each script or language under study, a script or language template is first constructed through a training process. Scripts and languages of document images are then determined according to the distances between converted document vectors and the pre- constructed script and language templates. Experimental results show that the proposed technique is accurate, easy for extension, and tolerant to noise and various types of document degradation. The technique proposes further investigation for the images containing perspective and curvature distortion and skew angle. In contrast, the global approaches measure the texture of a region of text to identify the underlying script. Some of the texture based approaches are detailed below; The method describing effectiveness of rotation invariant texture features for automatic script identification is found in [14]. The method computes features from text blocks using multi- channel gabor filters and constructs a representative feature vector for each language. Then, Euclidian distance classifier is used for script identification of 6 languages (Chinese, English, Greek, Russian, Persian, and Malayalam). An average classification accuracy of 96.7% is reported. The sensitivity of texture analysis to different fonts is also discussed.

33

International Journal of Graphics and Multimedia (IJGM), ISSN 0976 – 6448(Print), ISSN 0976 – 6456(Online) Volume 4, Issue 1, January - April 2013, © IAEME

A technique that investigates use of texture analysis for script and language identification from document images is presented in [15]. The method obtains a uniform block of text from document image. Multiple channel gabor filters and gray level coocurrence matrices (GLCMs) are used to extract texture features. Then K-NN classifier is used to classify seven languages; Chinese, English, Greek, Korean, Malyalam, Persian and Russian. The test results showed that gabor filters proved to be more accurate than the GLCMs, producing results which are over 95% accurate. The texture analysis technique for script identification is described in [16]. The method conducts evaluation of commonly used texture features for the purpose of script identification and provides a qualitative measure of which features are most appropriate for this task. The texture features include GLCM, Gabor filter bank energies, and a number of wavelet energy features. The experimental results have shown that the wavelet log co- ocurrence features outperform other techniques giving lowest error rate of 1%. The effectiveness of features extracted from co-occurrence histograms of wavelet decomposed images and KNN classifier for script identification of 7 Indian languages are discussed in [17]. Many recent works on script identification are reported in [18-19]. Out of many works cited in the literature, it is found that few limitations still exist with the reported script and language identification methods. First, the performance of local approaches depends upon correct segmentation of connected components. Consequently, they are very sensitive to the segmentation error resulting from noise and various types of document degradation. Second, the global techniques need more time to measure the texture of a region. But, these methods are of good choice for analysis of low resolution images of display boards. Hence, use of textural features is further investigated in the proposed work. It is also noticed that, the global techniques, operates on predefined size text blocks containing matter pertaining to same script for determination of script and language of underlying document. But this is not the case with written text on display boards in the Indian scenario, as text may contain multilingual information. Therefore, it is necessary to identify script and language at word level which is essential for later processing steps such as text understanding and language translation. The task of script identification at word level is difficult and challenging, because distinguishing properties are to be obtained from a small region containing text of variable size and font. Therefore more research is desirable/needed to model texture of small region containing text of variable size and font for better characterization and classification with reduced computational complexity. Hence, the current work is undertaken to identify new properties of texture using discrete cosine transform coefficients for script identification of low resolution images of display boards. The detailed description of the proposed methodology is given in the next section.

  • 3. PROPOSED METHODOLOGY FOR SCRIPT IDENTIFICATION

The proposed methodology uses DCT based texture feature for identification of the script class of low resolution display board images. The methodology comprises three phases; Preprocessing, Extraction of DCT Energy Features and Script Class Identification. The block diagram of proposed model is given in Fig. 1. The detailed description of each processing step is presented in the following subsections.

34

International Journal of Graphics and Multimedia (IJGM), ISSN 0976 – 6448(Print), ISSN 0976 – 6456(Online) Volume 4, Issue 1, January - April 2013, © IAEME

Test word image of display board

Preprocessing for Binarization and Bounding Box Generation Computational Strategy for Hindi Script Identification Test Word Image

Preprocessing for Binarization and Bounding Box Generation

 
Preprocessing for Binarization and Bounding Box Generation Computational Strategy for Hindi Script Identification Test Word Image

Computational Strategy for Hindi Script Identification

Test Word Image

Test Word Image Not Satisfied

Not Satisfied

Compute Discrete Cosine Transform Energy Features

 
Preprocessing for Binarization and Bounding Box Generation Computational Strategy for Hindi Script Identification Test Word Image

Threshold Based Classification

 
Preprocessing for Binarization and Bounding Box Generation Computational Strategy for Hindi Script Identification Test Word Image

Word Image Classified as Kannada/English

Hindi

International Journal of Graphics and Multimedia (IJGM), ISSN 0976 – 6448(Print), ISSN 0976 – 6456(Online) Volume

(satisfied)

Fig. 1: Block diagram of proposed method

  • 3.1 Preprocessing

The works reported in literature preprocess document image to obtain uniform sized text block, detect and correct skew, and remove uneven spacing between lines, word and characters to obtain optimal texture features for improved classification rate. Because, the presence of noise, skew and uneven spacing and other degradations significantly affect texture features leading to higher classification errors. But the preprocessing task is difficult, computationally expensive and may not be suitable for applications that process small of amount of text containing few lines. Hence, in this work, an attempt is made to evaluate

performance of new texture features extracted directly from variable sized word images without removal of noise, skew and uneven spacing and other degradations. However, the processing is done to binarize the image and generate bounding box around it.

  • 3.3 Extraction of DCT Energy Features

In this phase, Dimensional Discrete Cosine Transformation is applied on the processed image to obtain DCT matrix d of size MxN, and energy features E1, E2, and E3 are computed on the chosen regions of DCT coefficients as depicted in equations (1) to (3).

International Journal of Graphics and Multimedia (IJGM), ISSN 0976 – 6448(Print), ISSN 0976 – 6456(Online) Volume

……………………………. ……………………………………. (1)

International Journal of Graphics and Multimedia (IJGM), ISSN 0976 – 6448(Print), ISSN 0976 – 6456(Online) Volume

…………………………………………………………………

...

(2)

35

International Journal of Graphics and Multimedia (IJGM), ISSN 0976 – 6448(Print), ISSN 0976 – 6456(Online) Volume
International Journal of Graphics and Multimedia (IJGM), ISSN 0976 – 6448(Print),
ISSN 0976 – 6456(Online) Volume 4, Issue 1, January - April 2013, © IAEME
……. ...
(3)
Where,
d is a DCT matrix of dimension MXN obtained after applying DCT on input image.
Mid1 and Mid2 are column and row numbers used during computation of energy
feature E3.

The Fig. 2. shows regions chosen to calculate energy features E1, E2 and E3.

International Journal of Graphics and Multimedia (IJGM), ISSN 0976 – 6448(Print), ISSN 0976 – 6456(Online) Volume

Fig. 2. DCT matrix and 3 chosen regions for determining energy features E1, E2 and E3

3.4. Script Classification Identification

The script identification task consists of 2 processing stages. In stage1, the test word image is processed to determine whether it belongs to Hindi Script. Otherwise, stage 2 uses

threshold based classification to determine whether it belongs to Kannada or English Script. The functionality in both stages is described in the following sections;

3.4.1 Computational strategy for Hindi Script Identification In this stage, horizontal run statistics of test word image are used to determine whether the written word in display board image belongs to Hindi or other scripts. Initially, the horizontal runs of length greater than 6 are computed for every row of word image and are stored into a feature vector . The vector records row number and run length count of all runs for all rows. These run length values are thresholded to classify word image into two classes’ w1 and w2. Where, w1 corresponds to Hindi script and w2 corresponds to other scripts category. The classified word image into class w2 is further processed as in stage 2 to determine whether it belongs to Kannada or English Script.

36

International Journal of Graphics and Multimedia (IJGM), ISSN 0976 – 6448(Print), ISSN 0976 – 6456(Online) Volume 4, Issue 1, January - April 2013, © IAEME

3.4.2 Threshold based classification

The threshold classification phase of the proposed model uses discriminant function to classify English and Kannada scripts. The discriminant function use thresholds to

determine the script class. The thresholds are heuristic values, chosen empirically. The classification rules using discriminant function are stated below.

Algorithm 3.4.1: Threshold Based Classification

 

Input:

E1, E2 and E3

Output:

Script Class: English or Kannada

Begin

 

if

E1>=0.1000 &E1<=0.4000 & E2>=0.0200 & E2<=0.2000

Print “Script is ENGLISH” else if E1>=0.0300 & E1<0.1000 & E2>=0.0100 & E2<=0.0850 Print “Script is KANNADA”

end

End

//end of begin

  • 4. EXPERIMENTAL RESULTS AND DISCUSSION

The proposed methodology for script identification has been evaluated for low resolution word images of display boards with varying font size and style. The experimental tests were conducted for word images of 3 scripts; Hindi, Kannada and English and results were highly encouraging. The results of processing several display board word images dealing with various issues and the overall performance of the system are reported in section

4.1.

4.1 Script Identification: An experimental analysis dealing with various issues

The effectiveness of proposed methodology for script identification using DCT features has been evaluated for 800 low resolution word images of display boards. The images were captured from display boards of government offices in India. The image database consists of 300 Kannada, 300 English, and 200 Hindi script word images of varying resolutions. The images are characterized by variable number of characters, variable font size and style, uneven thickness and spacing between characters, minimal information context, small skew, noise and other degradations. The proposed methodology has produced good results for low resolution word images containing text of different size, font, and alignment with varying background. The approach also identifies script of small skewed text regions. Hence, the proposed method achieves an overall identification accuracy of 85.44% and individual identification accuracy of 100% for Hindi Script, 70.33% for Kannada Script and 86% for English Script. A closer examination of results revealed that misclassifications arise due to minimal information context, noise and larger skew, which affect the texture of region of text and performance of the texture based approach. The correctly classified images dealing with various issues are described in table 1. And the overall performance of the system is reported in table 2.

37

International Journal of Graphics and Multimedia (IJGM), ISSN 0976 – 6448(Print), ISSN 0976 – 6456(Online) Volume 4, Issue 1, January - April 2013, © IAEME

TABLE 1: The performance of the system of processing different images dealing with various issues

Input Sample Image

Image

 

Description

 

Resolution

 
33 x 101 Script identification of an image

33

x 101

Script identification of an image

 

having

minimal

information

context.

76 x 127 The effectiveness of the method in processing degraded Kannada Image containing characters of

76

x 127

The effectiveness of the method in processing degraded Kannada Image containing characters of uneven thickness, lighting and spacing between characters, noise, small skew and other degradations.

36 x 123 The robustness of the method in

36

x 123

The robustness of

the method in

 

of

an

image

 

identifying script containing 7

characters

and

degraded background.

 
96 x 351 The texture of an image having

96

x 351

The texture

of

an image

having

 

different font style and large font

 

size

is

correctly

modeled as

Kannada Script

132 x 451 The method processes a larger size blurred image with small skew and classifies

132 x 451

The method processes a larger size blurred image with small skew and classifies as English text.

78 x 151 The robustness of the method in processing a degraded unusual font image having

78

x 151

The robustness of the method in processing a degraded unusual font image having English text.

92 x 190 The effectiveness of the method in correctly classifying part of word text.

92

x 190

The effectiveness of the method in correctly classifying part of word text.

TABLE 2: Overall System Performance

Number of

Classified as

Classified as

Classified

Accuracy

Word Images

Kannada

English

as Hindi

200 Hindi

-

-

200

100%

300 Kannada

211

89

-

70.33%

300 English

-

258

-

86%

38

International Journal of Graphics and Multimedia (IJGM), ISSN 0976 – 6448(Print), ISSN 0976 – 6456(Online) Volume 4, Issue 1, January - April 2013, © IAEME

5. CONCLUSION

In this paper, a approach for word level script identification of low resolution images of display boards employing DCT energy features is proposed. The method identifies script of word image without applying techniques for removal of noise and other degradations. This aspect of work makes it more robust and efficient. The proposed set of texture features better model/organize the texture of a region of text and thus provide sufficient characterization. The threshold based classification function based on heuristics is found to be robust and efficient for improving classification accuracy. The testing of methodology for 800 low resolution word images containing text of different size, font, and alignment with varying background has yielded an average classification accuracy of 85.44%. The system is found to be resilient to the presence of small skew and degradations. This is a significant result, which makes this work suitable for text understanding and translation systems especially in the Indian context. The method can be extended for script identification of images belonging to other scripts. And further investigations can focus on language identification of word images.

REFERENCES

[1]

Abowd Gregory D. Christopher G. Atkeson, Jason Hong, Sue Long, Rob Kooper, and

[2]

Mike Pinkerton, 1997, “CyberGuide: A mobile context-aware tour guide”, Wireless Networks, 3(5): pp.421-433. Natalia Marmasse and Chris Schamandt, 2000, “Location aware information delivery

[3]

with comMotion”, In Proceedings of Conference on Human Factors in Computing Systems, pp.157-171. Tollmar K. Yeh T. and Darrell T., 2004, “IDeixis - Image-Based Deixis for Finding

[4]

Location-Based Information”, In Proceedings of Conference on Human Factors in Computing Systems (CHI’04), pp.781-782. Gillian Leetch, Dr. Eleni Mangina, 2005, “A Multi-Agent System to Stream Multimedia to Handheld Devices”, Proceedings of the Sixth International Conference on Computational Intelligence and Multimedia Applications (ICCIMA’05).

[5] Wichian Premchaiswadi, 2009, “A mobile Image search for Tourist Information System”, Proceedings of 9th international conference on SIGNAL PROCESSING,

COMPUTATIONAL GEOMETRY and ARTIFICIAL VISION, pp.62-67. [6] Ma Chang-jie, Fang Jin-yun, 2008, “Location Based Mobile Tour Guide Services Towards Digital Dunhaung”, International archives of phtotgrammtery, Remote Sensing and Spatial Information Sciences, Vol. XXXVII, Part B4, Beijing.

[7]

Shih-Hung Wu, Min-Xiang Li, Ping-che Yanga, Tsun Kub, 2010, “Ubiquitous Wikipedia on Handheld Device for Mobile Learning”, 6th IEEE International

Conference on Wireless, Mobile, and Ubiquitous Technologies in Education, pp. 228-

230.

[8]

Tom yeh, Kristen Grauman, and K. Tollmar., 2005, “A picture is worth a thousand

[9]

keywords: image-based object search on a mobile platform”, In Proceedings of Conference on Human Factors in Computing Systems, pp.2025-2028. Fan X. Xie X. Li Z. Li M. and Ma. 2005, “Photo-to-search: using multimodal queries to search web from mobile phones”, In proceedings of 7 th ACM SIGMM international workshop on multimedia information retrieval.

39

International Journal of Graphics and Multimedia (IJGM), ISSN 0976 – 6448(Print), ISSN 0976 – 6456(Online) Volume 4, Issue 1, January - April 2013, © IAEME

[10] Lim Joo Hwee, Jean Pierre Chevallet and Sihem Nouarah Merah, 2005, “SnapToTell:

Ubiquitous information access from camera”, Mobile human computer interaction with mobile devices and services, Glasgow, Scotland. [11] Lu Shijian, Chew Lim Tan, 2008, “Script and Language Identification in Noisy and Degraded Document Images”, IEEE transactions on pattern analysis and machine intelligence, 30(1), january. [12] Linlin Li; Chew Lim Tan; , 2008, "Script identification of camera-based images”, ICPR 2008. 19th International Conference on Pattern Recognition, pp.1-4, 8-11 Dec. 2008. [13] T.N. Tan, 1998, “Rotation Invariant Texture Features and Their Use in Automatic Script Identification,” IEEE Trans. Pattern Analysis and Machine Intelligence, 20(7), pp. 751-756 .. [14] G.S. Peake and T.N. Tan, 1997, “Script and Language Identification from Document Images,” Proc. Eighth British Mach. Vision Conf., vol. 2, pp. 230-233, Sept. [15] A. Busch, W.W. Boles, and S. Sridharan, 2005, “Texture for Script Identification,” IEEE Trans. Pattern Analysis and Machine Intelligence, 27(11), pp. 1720-1732. [16] Hiremath P. S. et al., 2010, “Script identification in a handwritten document image using texture features”, IEEE 2nd International Advance Computing Conference,pp.110-114, 2010. [17] Li Yang; Xuelong Hu; Jun Pan, 2008, "Approaches to image retrieval using fuzzy set theory", International Conference on Neural Networks and Signal Processing, pp.422- 425, 7-11 June 2008. [18] S. A. Angadi, M. M. Kodabagi, “Word Level Script Identification of Text in Low Resolution Images of Display Boards using wavelet features ”, Proceedings of International Conference on Advances in Computing (ICADC 2012),AISC 174, pp.209- 220, Springer India 2012. [19] M. M. Kodabagi and S. R. Karjol, “Script Identification from Printed Document Images using Statistical Features”, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 2, 2013, pp. 607 - 622, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. [20] P. Prasanth Babu, L.Rangaiah and D.Maruthi Kumar, “Comparison and Improvement of Image Compression using Dct, Dwt & Huffman Encoding Techniques”, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 1, 2013, pp. 54 - 60, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.

40