Sie sind auf Seite 1von 24

Computer originated documents are main form

transmitting information, and Document Image


Processing (DIP) deals with the conversion of text
document images into computer readable format.
Two phases of DIP: Binarization and Character
Recognition.
Binarization is the key process, as character
recognition significantly depends upon it.
It is also known as thresholding.
It is the process of separating for foreground text
region from the background surface.
For uniform background, binarization is easy due to
the fact that there is a distinct separation between
foreground and background.
For non-uniform background or noisy background,
some special binarization techniques are required.
Binarization are broadly classified into two categories-
global and adaptive.
In global binarization, single threshold value is used
for separation of foreground from background.
It gives poor performance under noisy environment.
In adaptive binarization, threshold value is selected
adaptively, hence, gives better performance for
binarization of degraded document images.
Adaptive binarization is used in degraded DIP.
The entire document image is scanned by a small
window. Window size may vary for different algorithm
Threshold value is selected based on the pixel
information under the window. Pixel information
generally include some statistical information such as
mean, standard deviation etc.
The statistical information are different for different
binarization techniques.

Thus, we get different threshold for different window
position.
As noise is generally present in the background surface
of document images, adaptive binarization can handle
the noise.
The performance of the binarization depends on the
threshold value of the degraded portion of the
document.

The resulting binary image is processed by this phase.
The binary image is first segmented, and then
character recognition is performed.
This character recognition can be done by Optical
Character Recognition (OCR) process.
The performance of the resulting binary image of
different binarization techniques can be evaluated in
OCR.
Four different adaptive binarization methods are
evaluated- Niblacks method, Sauvolas method, Gatos
et al.s method and finally Halabi et al.s method.
The performance of each method is evaluated by OCR.
A comparison is made between the OCR results of
different techniques.
Niblacks method is based on the pixel information
based on the local mean and standard deviation of the
pixel under the window.
The threshold value is calculated by the following
formula:
T = m + k * s
where, m is the local mean and s is the standard
deviation; k is constant usually set to -0.2.


This is slightly modified version of Niblacks method.
It considers the dynamics of standard deviation along
with the mean and standard deviation.
The threshold value is given by the following formula:
T=m * ( 1 k * ( 1 s/R))
where, R is the dynamics of standard deviation. For
grayscale images, the value of R is set to 128.
This method uses Sauvolas binarization technique as
an intermediate step.
It consists of five distinct steps:
1. Preprocessing
2. Rough estimation of foreground text region
3. Approximate background surface calculation
4. Final thresholding
5. Post processing
In DIP, a preprocessing step of documents are usually
performed for reducing the degree of noise.
Generally filtering of document images reduces noise.
Gaussian filter, average filter, median filter can be used
for this purpose.
In this method, an adaptive Wiener filter is used.
The window size of the filter depends upon the several
factors such as size, thickness of the text character.
This process causes blurring effect of the original
image.
Sauvolas method is used for this step.
The pixel value corresponding to the text region gives
the superset of the correct text pixels.
The wiener filtered grayscale image is binarized by
sauvolas method, all 0s in the output image
corresponds to the text regions.
A background surface is produced using the filtered
gray scale image and the roughly estimated text region.
The pixel value in the resulting image corresponding
to the non-text region in the roughly estimated
foreground image remains same as the original wiener
filtered gray scale image.
The pixel value in the resulting image corresponding
to the text region in the roughly estimated foreground
image is determined by interpolation of the
neighborhood pixels around it.
Final thresholding is done by combining the estimated
background surface with the filtered grayscale image.
A threshold value is determined by calculating the
distance between pixel of the original image and the
estimated background image.
The pixel value corresponds to text region if the
distance is greater than the threshold.
The threshold changes according to the pixel value of
the grayscale background image.
It preserves the textual information in presence of
degraded background.
Post-processing step is required for the further
reduction of noise level at the output binary image.
Post processing step is done using shrink and swell
filter.
Shrink filter examines each foreground pixel, if the
number background pixel under the window is greater
than some threshold value, the foreground pixel is
converted to the background pixel.
This results in thinning of the character as well as it
reduces noise (salt and pepper) in the background
surface.
Swell filter examines every background pixels, if the
number of foreground pixel is greater than the
threshold, the background pixel is converted to the
foreground pixel.
It fills the holes in the text as well as thickness of the
text character is increased.
The size of the filtering window and the threshold
value depends on the average size of the character
It is almost same as the previous method except it uses
Gaussian filter in the preprocessing step.
Due to the use of the gaussian filter, blurring of edges
occur.

At Niblacks method, the binarization performs well in
the areas near to the text region. However, large
amount of noise is observed in non-textual portions.
This is the main drawback of Niblacks approach.
At Sauvolas method, the binarization performance is
good in presence of noise such as non uniform
background, low contrast etc. It solves the problem of
Niblacks method i.e. the noisy binarization of non-
textual region by taking care of the dynamic range of
the grayscale background. But as the degradation
becomes severe in background, the resulting
binarization performance decreases.

The method proposed by Gatos et al. performs well in
presence of degraded background. The threshold
calculation depends on the estimated background
surface and binarizes the text image by taking care of
the degradation in the background.
The method proposed by Y.S. Halabi et al. also
performs well in presence of degradation. The
Gaussian function used in the preprocessing step takes
care of the noise present in the original grayscale text
document image. However, blur of edges occur due the
Gaussian function and information loss may occur
because of the blurring effect.

Gatos B., I. Pratikakis, S. J. Perantonis, Adaptive Degraded
Image Binarization, Journal of Pattern Recocnition, 39
(2006)
Sauvola J., M. Pietikainen, Adaptive Document Image
Binarization, Pattern Recognition 33 (2000)
Basilios Gatos, Ioannis Pratikakis and Stavros J. Perantonis
An Adaptive Binarization Technique for Low Quality
Historical Documents (2004)
Yahia S. Halabi, Zaid SA, Faris Hamdan, Khaled Haj Yousef
Modeling Adaptive Degraded Document Image
Binarization and Optical Character System (2009)
W. Niblack, An Introduction to Digital Image Processing,
Prentice-Hall, Englewood Cliffs, NJ, 1986
ABBYY (www.finereader.com).

Das könnte Ihnen auch gefallen