Processing (DIP) deals with the conversion of text document images into computer readable format. Two phases of DIP: Binarization and Character Recognition. Binarization is the key process, as character recognition significantly depends upon it. It is also known as thresholding. It is the process of separating for foreground text region from the background surface. For uniform background, binarization is easy due to the fact that there is a distinct separation between foreground and background. For non-uniform background or noisy background, some special binarization techniques are required. Binarization are broadly classified into two categories- global and adaptive. In global binarization, single threshold value is used for separation of foreground from background. It gives poor performance under noisy environment. In adaptive binarization, threshold value is selected adaptively, hence, gives better performance for binarization of degraded document images. Adaptive binarization is used in degraded DIP. The entire document image is scanned by a small window. Window size may vary for different algorithm Threshold value is selected based on the pixel information under the window. Pixel information generally include some statistical information such as mean, standard deviation etc. The statistical information are different for different binarization techniques.
Thus, we get different threshold for different window position. As noise is generally present in the background surface of document images, adaptive binarization can handle the noise. The performance of the binarization depends on the threshold value of the degraded portion of the document.
The resulting binary image is processed by this phase. The binary image is first segmented, and then character recognition is performed. This character recognition can be done by Optical Character Recognition (OCR) process. The performance of the resulting binary image of different binarization techniques can be evaluated in OCR. Four different adaptive binarization methods are evaluated- Niblacks method, Sauvolas method, Gatos et al.s method and finally Halabi et al.s method. The performance of each method is evaluated by OCR. A comparison is made between the OCR results of different techniques. Niblacks method is based on the pixel information based on the local mean and standard deviation of the pixel under the window. The threshold value is calculated by the following formula: T = m + k * s where, m is the local mean and s is the standard deviation; k is constant usually set to -0.2.
This is slightly modified version of Niblacks method. It considers the dynamics of standard deviation along with the mean and standard deviation. The threshold value is given by the following formula: T=m * ( 1 k * ( 1 s/R)) where, R is the dynamics of standard deviation. For grayscale images, the value of R is set to 128. This method uses Sauvolas binarization technique as an intermediate step. It consists of five distinct steps: 1. Preprocessing 2. Rough estimation of foreground text region 3. Approximate background surface calculation 4. Final thresholding 5. Post processing In DIP, a preprocessing step of documents are usually performed for reducing the degree of noise. Generally filtering of document images reduces noise. Gaussian filter, average filter, median filter can be used for this purpose. In this method, an adaptive Wiener filter is used. The window size of the filter depends upon the several factors such as size, thickness of the text character. This process causes blurring effect of the original image. Sauvolas method is used for this step. The pixel value corresponding to the text region gives the superset of the correct text pixels. The wiener filtered grayscale image is binarized by sauvolas method, all 0s in the output image corresponds to the text regions. A background surface is produced using the filtered gray scale image and the roughly estimated text region. The pixel value in the resulting image corresponding to the non-text region in the roughly estimated foreground image remains same as the original wiener filtered gray scale image. The pixel value in the resulting image corresponding to the text region in the roughly estimated foreground image is determined by interpolation of the neighborhood pixels around it. Final thresholding is done by combining the estimated background surface with the filtered grayscale image. A threshold value is determined by calculating the distance between pixel of the original image and the estimated background image. The pixel value corresponds to text region if the distance is greater than the threshold. The threshold changes according to the pixel value of the grayscale background image. It preserves the textual information in presence of degraded background. Post-processing step is required for the further reduction of noise level at the output binary image. Post processing step is done using shrink and swell filter. Shrink filter examines each foreground pixel, if the number background pixel under the window is greater than some threshold value, the foreground pixel is converted to the background pixel. This results in thinning of the character as well as it reduces noise (salt and pepper) in the background surface. Swell filter examines every background pixels, if the number of foreground pixel is greater than the threshold, the background pixel is converted to the foreground pixel. It fills the holes in the text as well as thickness of the text character is increased. The size of the filtering window and the threshold value depends on the average size of the character It is almost same as the previous method except it uses Gaussian filter in the preprocessing step. Due to the use of the gaussian filter, blurring of edges occur.
At Niblacks method, the binarization performs well in the areas near to the text region. However, large amount of noise is observed in non-textual portions. This is the main drawback of Niblacks approach. At Sauvolas method, the binarization performance is good in presence of noise such as non uniform background, low contrast etc. It solves the problem of Niblacks method i.e. the noisy binarization of non- textual region by taking care of the dynamic range of the grayscale background. But as the degradation becomes severe in background, the resulting binarization performance decreases.
The method proposed by Gatos et al. performs well in presence of degraded background. The threshold calculation depends on the estimated background surface and binarizes the text image by taking care of the degradation in the background. The method proposed by Y.S. Halabi et al. also performs well in presence of degradation. The Gaussian function used in the preprocessing step takes care of the noise present in the original grayscale text document image. However, blur of edges occur due the Gaussian function and information loss may occur because of the blurring effect.
Gatos B., I. Pratikakis, S. J. Perantonis, Adaptive Degraded Image Binarization, Journal of Pattern Recocnition, 39 (2006) Sauvola J., M. Pietikainen, Adaptive Document Image Binarization, Pattern Recognition 33 (2000) Basilios Gatos, Ioannis Pratikakis and Stavros J. Perantonis An Adaptive Binarization Technique for Low Quality Historical Documents (2004) Yahia S. Halabi, Zaid SA, Faris Hamdan, Khaled Haj Yousef Modeling Adaptive Degraded Document Image Binarization and Optical Character System (2009) W. Niblack, An Introduction to Digital Image Processing, Prentice-Hall, Englewood Cliffs, NJ, 1986 ABBYY (www.finereader.com).
ChatGPT Side Hustles 2024 - Unlock the Digital Goldmine and Get AI Working for You Fast with More Than 85 Side Hustle Ideas to Boost Passive Income, Create New Cash Flow, and Get Ahead of the Curve
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
ChatGPT Millionaire 2024 - Bot-Driven Side Hustles, Prompt Engineering Shortcut Secrets, and Automated Income Streams that Print Money While You Sleep. The Ultimate Beginner’s Guide for AI Business