Beruflich Dokumente
Kultur Dokumente
1
International Journal of Computer Applications (0975 – 8887)
Volume *– No.*, ___________ 2013
A general statement of the problem of machine recognition of corresponding to its data field is stored for further stages of
OCR can [2] be formulated as: given a scanned image of a classification as seen in figure 2.
handwritten manuscript, detect and recognize characters from
it using a stored database of characters. Available information
such as alphabets, digits, special characters may be used in
narrowing the search for better accuracy.
The solution to the mentioned problem consists of six stages: Figure 2 Segment Data Row
Scanning, Image Pre-processing, Segmentation, Feature
Extraction, SVM based Classification & Recognition and
Analysis & storage. 3.3.Preprocessing
3.1.Template processing The preprocessing [4] algorithm deal with more specific
problems such as Binarization (thresholding), Smoothing &
This is one of the most crucial steps of the process. In this
noise removal, segmentation and Normalization. Preprocessing
process, the intention is to adjust the image of the questionnaire
stage involves all the operations to produce a clean character
and align it to the intended template. This ensures that the
image to provide an efficient input to the feature extraction
coordinates stored in the system for processing the data
stage. Before extracting features from an image, a sequence of
extraction exactly matches with the handwritten data and loss
simple, common preprocessing is applied in order to
of data is avoided. A processed template of size 954 x 1236 is
standardize the data and make it feasible for the recognition
as shown in figure 1.
algorithms thereby reducing the complexity to a certain extent.
3.3.1.Binarization (Thresholding)
Binarization [5] is one of the important techniques for
preprocessing stage. It converts a colored or grayscale image
into a black and white image. Among many binarization
techniques, the Otsu’s method [1] is considered as the most
commonly-used technique. If a pixel value is greater than or
equal to the threshold intensity, it is considered as a white ("0").
If a pixel in the image has intensity less than the threshold
value, the resulting pixel is black (“1”).
If an image is a color image, it is first converted into a grayscale
image. The grayscale image then undergoes the process of
binarization. We obtained a threshold value for each image
using MATLAB’s graythresh (image) function and passing the
image as a parameter. The threshold is then used to binaries the
image. Figure shows the conversion of a color image to a
grayscale image. Figure 3 corresponds to an image before
binarization and figure 4 represents the image after the
binarization process.
3.2.Data Extraction
We will now have a form that is aligned with the template
stored in the system. We will also have preprocessed
coordinates of the details that need to be cropped and extracted. Figure 4 Image after Binarization
In this process, the handwritten data is matched to its
corresponding field. The stages for data extraction are as
explained below: 3.3.2.Smoothing and Noise Removal
3.2.1.Data Segmentation The process of scanning introduces noise. Smoothing is used to
Data segmentation process crops a part of the form image to eliminate the noises. Smoothing and noise removal [7] can be
match it to its particular field. The process occurs after the done by filtering. Circular Mean filtering [2] is an easy to
image undergoes template processing. The processed image implement method of smoothing images. It reduces the amount
contains only handwritten data. This handwritten data image of intensity variation between one pixel and the next.
2
International Journal of Computer Applications (0975 – 8887)
Volume *– No.*, ___________ 2013
3.3.3.Character Segmentation
Segmentation is a crucial step of OCR systems as it extracts
meaningful regions for analysis. Segmentation is the process of
detecting points and it is important for recognition because
determining the start and end of a character is vital in the
correct detection of the character. This in turn affects the
accuracy of the algorithm in question. Segmentation will
provide the individual characters which can then be further
processed for identification. Figure 5 represents the segmented
characters of the input image.
3.3.4.Normalization
Total Recognition Time for
3.4.Multi-class
Dataset Support
samples rate Vector
(%) Machine
training
Classification
3.5.Storage500
Numeric 80 10 minutes
4.PERFORMANCE EXPERIMENTS
5.CONCLUSION
Alphabets 500 74 30 minutes
6.REFERENCES
Alpha-
500 65 50 minutes
Numeric
Alpha-
1000 68 2.5 hours
Numeric
Alpha- 8 hours
2255 72
Numeric
Alpha- 14 hours
3000 76
Numeric
Alpha- 22 hours
4500 83
Numeric