Sie sind auf Seite 1von 7

Optical Character

recognition by:
Timothy, Saumya &
Owais

What is an OCR?
An OCR system consists of a normal scanner and some special
software. The scanner is used to scan the text on a document or
a piece of paper into the computer. The OCR software then
examines the page and changes the letters into a form that can
be edited and processed by a normal word processing package.
The ability to scan the characters accurately depends on how
clear the handwriting is.
Scanner has been able to read different styles and size of text as
well as neat handwriting.
Although they are often up to 95% accurate, any text scanned
with OCR needs careful checking because some letters can be
misread.
OCR is used to automatically recognize postcodes on letters at
sorting offices.

Software involved
Software such as Cuneiform

and

Tesseract use a two-pass approach to character

recognition.
De-skew If the document was not aligned properly when scanned, it may need to
be tilted a few degrees clockwise or counter clockwise in order to make lines of text
perfectly horizontal or vertical.
Despeckle remove positive and negative spots, smoothing edges .
Binarization Convert an image from colour or grey scale to black-and-white (called
a "binary image" because there are two colours). In some cases, this is necessary
for the character recognition algorithm; in other cases, the algorithm performs
better on the original image and so this step is skipped.
Line removal Cleans up non-glyph boxes and lines
Layout analysis or "zoning" Identifies columns, paragraphs, captions, etc. as
distinct blocks. Especially important in multi-column layouts and tables.
Line and word detection Establishes baseline for word and character shapes,
separates words if necessary.
Character isolation or "segmentation" For per-character OCR, multiple characters
that are connected due to image artefacts must be separated; single characters
that are broken into multiple pieces due to artefacts must be connected.

Advantages of OCR
Cheaper than paying someone to manually

enter large amount of text.


Much faster then someone manually entering
large amount of data.
The latest software can recreate tables and
the original layout.

Disadvantages of OCR
Not 100% accurate, there are likely to be

some mistakes made during the processes


All documents need to be checked over
carefully and then manually corrected.
If the original document is of a poor quality or
the handwriting is difficult to read, more
mistakes will occur.
Not worth doing for small amount of text.

Areas of application.
Banking

The uses of OCR vary across different fields. One widely known application is in banking, where OCR is
used to process checks without human involvement. A check can be inserted into a machine, the
writing on it is scanned instantly, and the correct amount of money is transferred. This technology has
nearly been perfected for printed checks, and is fairly accurate for handwritten checks as well, though
it occasionally requires manual confirmation. Overall, this reduces wait times in many banks.
Legal
In the legal industry, there has also been a significant movement to digitize paper documents. In order
to save space and eliminate the need to sift through boxes of paper files, documents are being
scanned and entered into computer databases. OCR further simplifies the process by making
documents text-searchable, so that they are easier to locate and work with once in the database.
Legal professionals now have fast, easy access to a huge library of documents in electronic format,
which they can find simply by typing in a few keywords.
Healthcare
Healthcare has also seen an increase in the use of OCR technology to process paperwork. Healthcare
professionals always have to deal with large volumes of forms for each patient, including insurance
forms as well as general health forms. To keep up with all of this information, it is useful to input
relevant data into an electronic database that can be accessed as necessary. Form processing tools,
powered by OCR, are able to extract information from forms and put it into databases, so that every
patient's data is promptly recorded. As a result, healthcare providers can focus on delivering the best
possible service to every patient.

Pictures of hardware of
OCR

Das könnte Ihnen auch gefallen