Sie sind auf Seite 1von 5

8th ICCIT 2005

Islamic University of Technology (IUT), 28-30 December 2005

Bengali Handwritten Bank check Recognition Using


Automatic Extraction of the User–Entered Data
Md. Rezaul Hoque Khan, Gahangir Hossain †
Department of Electrical & Electronics Engineering,
Chittagong University of Engineering & technology, Chittagong-4349, Bangladesh.

Department of Computer Science & Engineering,
Bangladesg University of Engineering & technology, Bangladesh.
sohagiut@yahoo.com , ghcsecuet@yahoo.com

Abstract their respective values into the computer. Accordingly,


This paper deals with methodology proposes to the field of automatic check processing has witnessed
recognize a bank check written in Bengali using sustained interest for a long time. The performance in
automatic extraction of the user–entered Data. Despite handwriting recognition is greatly improved by
its apparent simplicity, a check is a very complex constraining the writing, which addresses the problem
document. It integrates pictorial images (colored of segmentation and makes the people write more
layout), many pre-printed components (logos, carefully. A system that is able to read checks
guidelines, labels of data-entry fields…) as well as automatically would be very helpful, especially if it is
handwritten components (signature, date, issuing place, fast and accurate. Even if misclassification occurs, the
literal and courtesy amount …). In addition, these items mistake could potentially be detected during the
of information do not have a normalized position and recognition process; however it is more desirable that
their structuring varies according to countries and the system rejects a check in case of doubt so that it can
institutions. A template has to design for extracting the be directed to manual processing from the beginning.
user–entered items of any bankcheck, no mattering The organization of this paper is as follows. Section 2
which financial institution has issued it. The Extracted gives a brief description of bank checks. Section 3
data then will recognize and verify with database. provides system overview. Section 4 describes, step by
step, the extraction of user-entered data and the
Keywords: Extraction, Recognition, Verification. elimination of non–interested information. Section 5
Bankcheck. represents process of recognition of data. The data are
then verified by section 6. Finally, the discussion and
I. INTRODUCTION conclusion is stated in the last section.

Hand writing recognition has become exceedingly


popular in the past few years. Most applications belong II. DESCRIPTION OF BANKCHECKS
to the off-line recognition category that corresponds to
already written documents acquired by a scanner or a
camera. One of the most representative examples of this
category is bank check processing. The account number
is printed on the checks in magnetic ink and is the only
fields that can be processed automatically with near-
perfect accuracy by magnetic ink character recognition
(MICR) systems. The other fields may be handwritten,
typed, or printed; they contain the name of the recipient,
the date, the amount to be paid (textual format), the
courtesy amount (numerical format) and the signature of Fig. 1: A sample of Bangladeshi bank check (Sonali
the person who wrote the check. Bank)
The official value of the check is the amount written in
words; this field of the check is called “literal amount". Check No Account No
The amount written in numbers is supposed to be for Name and Address of
courtesy purposes only and is therefore called “courtesy the Financial Institute Date
amount". The information contained in a check is Payee’s name
frequently handwritten, specially considering that most
Literal amount
of the checks that were written by computer systems
Courtesy amount
have been gradually replaced by newer methods of
electronic payment. Handwritten text and numbers are Signature
difficult to read by automatic systems (and sometimes
even for humans); so check processing normally Fig. 2: A bankcheck layout description
involves manual reading of the checks and keying in

ISBN 984-32-2873-1 ICCIT 2005 650


In Bangladesh each financial institution is responsible for extracting the user– entered data: position evaluation
for designing and providing bankchecks to its for adjusting the skew angle and evaluating the
customers. The financial institutions must follow some horizontal and the vertical position of the scanned
general standard rules stated by the Bangladesh Bank to bankcheck images; items extraction for extracting the
manufacture checks. These standards determine the user–entered data, background elimination for
elements in a check, their location and size as well as eliminating the background pattern present on the
specification for the paper. In spite of this bankcheck images; baseline erasing and character
standardization, the financial institutions are free to elimination for eliminating the vertical and horizontal
customize some parts of the bankchecks such as the baseline, and the character string printed below the
background pattern. The background pattern is signature baseline. Finally, tracing recovering for
generally used to personalize the bankchecks, and they recovering some pixels that were lost during the item
can assume different colors and have imprinted pictures extraction and baseline elimination. Data recognition
and drawings. Other variations may occur among the module will present different architecture to recognize
bankchecks issued by different financial institutions courtesy amount, literal amount and method for
such as different character fonts, special symbols, logos, signature verification. The database proposes to design
lines, etc. Figure 1 shows a sample of a Bangladeshi to store financial institution information, the
bankcheck. Figure 2 shows the distribution of the background pattern samples, customer’s information,
information that the bankchecks may have, in bankcheck layout information, and other parameters
accordance with the standards of Sonali Bank, needed for bankcheck processing.
Bangladesh.

III. SYSTEM OVERVIEW IV. USER–ENTERED DATA EXTRACTION

The system that we have proposed for automatic In spite of intensive research efforts, the degree of
processing bankchecks is composed by four main automation in processing bankchecks is still very
modules: data acquisition, image processing, data limited. Recent papers have addressed the problem of
recognition and database as shown in Figure 3. automatic processing of bankchecks [1],[2],[4].
Currently, two strategies are used for extracting the
user–entered data: thresholding techniques and image
R eal B an k ch eck subtraction. Different thresholding techniques have been
suggested to isolate the user–entered data from the
O p tical S can n er
bankchecks [2],[3],[7]. These techniques have shown
good results only if the bankchecks do not have
complex background patterns. If these techniques were
Im ag e P ro cessin g applied to bankchecks in which the background pattern
has colorful pictures and draws, it would be very
D ate difficult to find a threshold value to segment the
background from the other elements. Furthermore, it
would be very difficult to segment the printed
information from the user–entered data since they can
L iteral am o u n t
have similar gray levels. On the other hand, the
techniques based on image subtraction have shown
more robustness to segment the user–entered data.
C o u rtesy am o u n t Okada–Shridhar (1997) [7] have proposed a method for
extracting the user– entered data from American
bankchecks that have colorful pictures on the
background pattern
S ig n atu re Here in this paper we propose an approach
where the information in which we are not interested
will eliminate in a stage before the information
Fig. 3: System Overview extraction. The main idea is to handle only sub–images
that contain the user–entered data, said, the courtesy
The data acquisition module includes two devices, an amount, the literal amount, the date, the payee’s name,
optical scanner and a MICR scanner. Actually there are and the signature. After digitizing the bank check image
many devices that include both an optical and a MICR is position–adjust and a template will use to extract the
scanner. The optical scanner provides a digitized image areas where the user–entered data is supposed to appear.
of the bankchecks with 200 dpi spatial resolution and For each of the five resulting sub–images, the
256 gray levels for the image-processing module, while corresponding background pattern will subtract from a
the MICR scanner reads account number. The image sample stored in a database. The baselines present in the
processing module will compose of several algorithms sub–images will detect by using an algorithm based on

651
the projection profiles. The baselines detect will erase which will being processed. The basic template is
by substituting the corresponding positions with white presented in Fig. 4. The information present into the
pixels. The character strings present below the baseline white areas will maintain while the information that will
dedicate for signature will eliminate by other coincide with the black areas will eliminate. By
morphological subtraction operation between the applying the template we will be able to eliminate the
corresponding sub–image and a generated binary image redundant information, that is, the information in which
which contains every make–up character string to be we are not interest, and segment the different user–
eliminated. In this new approach we propose to include entered data. Figure 5 shows a resulted image after
a tracing recovering algorithm that will recover some applying the template operation. The output of the item
parts of the user–entered data that can be lost during the extraction algorithm are sub–images representing the
baseline erasing. This novel approach assumes that a user–entered items: digit amount, worded amount,
sample of every background pattern is available in a payees’ name, date and signature.
database.

A. Item Extraction
Due to the standardization of the layout structure of
bankchecks, it is reasonable to design a template for
extracting the interested data from any bankcheck, no
mattering that customer has filled–in it or which
financial institution has issued it, only using a prior
knowledge about the domain. As we are only interested
on the user–entered data, the template must include
every area where this data is supposed to appear into the Fig, 4: The template
bankchecks. The other areas can be eliminated without
compromising the understanding of the document.
Nevertheless, these areas are not located exactly at the
same position for every bankcheck. There are small
position variations, depending on the financial
institution that has issued the bankcheck. Thus, the
template must be adapted for these small variations in
order to avoid selecting wrong areas, what can cause the
loss or the degradation of the interested data. From a
basic template, we propose to construct a database with
the possible positions of the interested data. During the
processing, the database will access and the convenient Fig. 5: The extracted areas
parameters will select to adjust the template, according
to the financial institution that will issue the bankcheck

V. DATA RECOGNITION strings into individual digits is a challenging task


because of connected and overlapping digits, broken
A. Courtesy Amount recognition digits, and digits that are physically connected to pieces
Our proposed courtesy amount format is of strokes from neighboring digits. Our proposed
architecture shown in figure 6 involves four stages:
1 2 0 0 0 0
segmentation of the string into individual digits,
In the case of checks, the segmentation of unconstrained normalization, recognition of each character using a
neural network classifier, and syntactic verification.

Segmentation and Recognition

Post-Processing
Divide Slant Correction Neural Network
Image of
String into
Courtesy Digits Size Normalization
Amount
Thickness
Normalization

Fig. 6: Key steps in reading the courtesy amount.

652
Neural network based recognition applications. In our endeavor, the recognition module
While template matching, structural analysis and neural will implement as an array of four neural networks work
networks have been very popular classification in parallel. The proposed recognition procedure is
techniques for character recognition, neural networks shown in Figure 7.
are now increasingly used in handwriting recognition

Fig. 8: General architecture of the proposed system

C. Signature verification
Although handwritten signatures are by no means the
most reliable means of personal identification, it
remains one of the most widely acceptable means of
personal identification. It is also non intrusive,
inexpensive and one of the most commonly used
personal identification systems. As by this time
extensive work have been done on this signature
verification field, and benefit comes because there is no
language boundary to apply any suggested method to
apply for other language. In our bankcheck processing
we found that An Off-Line Signature Verification
System Using Hidden Markov Model and Cross-
Validation [5] would be most significant where the
mean error rate is less then 1.5%.

VI. DATABASE

A database proposes to design to store images,


parameters and data that will be used during the
bankcheck processing steps. The images are related to
Fig. 7: General scheme of Recognition Module. the samples of the background patterns from the
bankchecks. Several parameters will store in the
database such as the parameters will use to adjust the
B. Literal Amount recognition
One of the most difficult tasks in automatic check template. The data represents the information printed on
processing is the literal amount recognition. Indeed, the bankchecks, such as customer's name, agency’s
writing found on checks certainly covers all the possible address, account number, etc. The information stored in
and conceivable writing styles. In addition, these entries the database must be indexed using adequate keywords
are far from being the best writing samples that their that will provide a unique identification for each
authors could produce: in addition to the fact that a account. The keywords will use for indexing the
writer is rarely happy to fill a check, he is not information in the database will compose of the three
necessarily well placed to do it. For these reasons, fields present on the bankchecks, as follow:
researchers try to use all available contextual knowledge
and to combine several complementary approaches to bank agency account
achieve highly reliable results in literal amount number number number
recognition. Figure 8 shows general architecture for To reduce the complexity of the database, it will have to
literal amount recognition of our proposed system. design to have three levels: bank–level, agency–level
and account–level. Into the bank–level will store the
background pattern samples, and the parameters use to
adjust the template, according to the financial institution
characteristics. The data which is printed on the

653
bankchecks regarding bank’s agency name and its [3] G. Dimauro, S. Impedovo, G. Pirlo, A. Salzo,
address will include into the agency–level, and it has to “Automatic Bankcheck Processing: A New
store as ASCII text. In this level is also including a list Engineered System”, Int. Journal of Pattern
of the agency’s customers and his/her account numbers. Recognition and Artificial Intelligence 11 (1997),
The account–level will contain the personal information 467–504.
about every customer and the data that is printed on the [4] J. E. B. Santos, F. Bortolozzi, R. Sabourin, “A
bankchecks, such as the customer’s name, his/her Simple Methodology to Bank Cheque
personal information, etc. This data will also store as Segmentation”, First Brazilian Symposium on
ASCII text. The size of the database is directly related Document Image Analysis (1997), 334–343.
to the number of the financial institutions, agencies and [5] E.J.R. Justino, A. El Yacoubi, F. Bortolozzi, R.
accounts. Sabourin, “An Off-Line Signature Verification
System Using Hidden Markov Model and Cross-
VII. DISCUSSION AND CONCLUSION Validation” XIII Brizilian Symposium on
Computer Graphics and Image Processing
While extensive efforts have already been devoted to (SIBGRAPI'00) p. 105
Latin and oriental check processing systems, to the best [6] S. Knerr, V. Anisimov, O. Baret, N. Gorski, D.
of our knowledge, no attempts have been made towards Price, J. C. Simon, “The A2iA Intercheque System:
an Bengali Handwritten check processing system. This Courtesy amount and Legal Amount Recognition
is probably due to the lack of supporting infrastructure for French Checks”, Int. Journal of Pattern
required to conduct, develop, and compare such systems Recognition and Artificial Intelligence 11 (1997),
in order to advance towards systems operating on real 505–548.
databases. Our system will address a complete solution [7] M. Okada, M. Shridhar, “Extraction of User
for extracting and identifying the information from Entered Components from a Personal Bankcheck
bankchecks. The other approaches focus mainly the Using Morphological Subtraction”, Int. Journal of
extraction of the digit and worded amounts, and Pattern Recognition and Artificial Intelligence 11
sometimes the extraction of the date [8], [10], [11]. (1997), 699–715.
Neither of them has proposed a solution for extracting [8] A. Bishnu and B. B. Chaudhuri, “Segmentation of
all the items of the bankchecks. By applying our Bangla handwritten text into characters by
proposed system, an automatic bankcheck processing recursive contour following”, Proc. 5th ICDAR,
system for payment might be feasible for practical pp.402-405, 1999.
applications [9] Yi-Kai Chen and Jhing-Fa Wang, “Segmentation of
Single-or Multiple-Touching Handwritten Numeral
REFERENCE String Using Background and Foreground
Analysis”, IEEE PAMI vol.22, 1304-1317, 2000.
[1] L. Koerich, L. L. Lee, “Automatic Extraction of [10] U. Pal, Sagarika Datta “Segmentation of Bangla
Filled–in Information from Bankchecks Based on Unconstrained Handwritten Text”, ICDAR 2003,
Prior Knowledge About Layout Structure”, First 1128-1132
Brazilian Symposium on Document Image Analysis [11] U. Pal, B. B. Chaudhuri “Automatic Recognition of
(1997), 322–333. Unconstrained Off-Line Bangla Handwritten
[2] G. Dimauro, S. Impedovo, G. Pirlo, A. Salzo, Numerals”, ICMI 2000, 371-378
“Bankcheck Recognition Systems: Re-Engineering
the Design Process”, III Int. Workshop on Frontiers
in Handwriting Recognition (1996), 419–426.

654

Das könnte Ihnen auch gefallen