Sie sind auf Seite 1von 8

1760 Neil Avenue

150 Pomerene Hall

Columbus, OH 43210-1297
Phone: (614) 292-3307

Email: webaccess@osu.edu
URL: www.wac.ohio-state.edu

Creating Accessible PDF from Scanned Documents


WAC Workshop. March 2005.
Written and Presented by: Lori Bailey.
Table of Contents:

Scanned PDF and Accessibility ........................................................................ 2


Options for Scanning to PDF................................................................................... 2
How to Scan to PDF .............................................................................................. 2
Scanning Basics ............................................................................................................. 3
Creating PDF: TIFF to PDF using Acrobat 6 Professional........................................................ 3
Creating PDF: Directly using Acrobat 6 Professional ............................................................. 3

Editing Your PDF for Accessibility ............................................................................ 4


Steps for An Accessible Scanned PDF Document.................................................................. 4
Performing OCR Using Acrobat 6 Professional. .................................................................... 4
Cleaning Up Acrobat OCR................................................................................................. 4
Verifying Your Document Text .......................................................................................... 5
Adding Tags to Your Document......................................................................................... 6
Checking Your Document for Accessibility........................................................................... 6

Resources.................................................................................................... 7
Guides and Tutorials ............................................................................................. 7
Software ............................................................................................................. 8

Required Software.
Acrobat 6.0 Professional or higher with Accessibility Checker.

Scanner with highest resolution (DPI) available.

Alternate: use one of the commercially available software converters

PDF From Scanning: March 2005

Page 2 of 8

Scanned PDF and Accessibility


Scanned PDF documents represent the most nefarious type of document in terms of accessibility
to users of assistive technology. Why? Because, in most cases, a document scanned directly to
PDF or scanned and then converted to PDF will be transferred as a large image file. Each page
will contain one large image with all text, tables, images, and graphics grouped into that image.
Text on the page is not searchable or selectable. To the assistive technology user, the document
appears completely blank.
To make a scanned document accessible, you must convert the image of the document into "real"
text. That is, the text must be selectable and scalable. This is usually done through OCR
(Optical Character Recognition). If the PDF version is also to be your accessible version, you'll
need to add additional accessibility mark-up adding "tags" to your PDF, adding alternative text
for images, graphs, and charts, and adding header information to data tables. In addition, text
created from a scanned image of a document is often converted into unexpected segments and
these segments may be out-of-order in terms of the expected read-order of the document. You'll
need to perform several checks to insure correct read-order is established, once your document is
converted.

Options for Scanning to PDF


Basically, you have two choices for creating your PDF document from paper. You can scan your
document into an image file (typically a TIFF) and then convert the image file into a PDF. Or
you can scan directly into PDF using the "Create PDF from Scanner" option in Acrobat; using
your scanner's PDF conversion option; or using commercially available conversion software.
PDF experts tend to suggest using the first option scanning to a TIFF and then importing into
Acrobat or your PDF creation software. By separating the steps, you can focus first on creating
clean, high-quality scans of the document and then worry about converting to an accessible PDF.
If you process your documents directly to PDF, you may need to do several rescans at different
DPI and different settings, before you have a PDF that can be successfully manipulated.
However, once your settings have been established, we found little difference between creating a
TIFF and scanning directly to PDF.
Regardless of how you scan your document, you will need to do some follow-up after the PDF
version has been created to add accessible features. This can be a very simple process, for
simple documents, or a very lengthy and complex process, for complex documents, and much
depends on what software you have available.

How to Scan to PDF


Each scanner is different and uses different software, different defaults, and different preset
configurations. You'll probably need to experiment to find out which settings work best for the
types of documents you are scanning and converting. In the examples below, we used an Epson
Perfection 1660 Photo scanner and customized the settings to 400DPI Black and White Photo
output.

PDF From Scanning: March 2005

Page 3 of 8

Scanning Basics
1) Place your document on the scanner bed. Be sure it is as straight as possible.
2) Press the scan button on the scanner front or open your scanner software and choose
"acquire image"
3) Adjust the settings of your scan software to optimize your scan. We suggest scanning in
black and white, unless you need to maintain color. Black and white documents typically
have more success in OCR, because color can shade text and cause read errors. Adobe
Acrobat can only perform OCR on documents scanned at 200-600 DPI. We found 400
DPI gave us the best OCR conversion with our sample documents.
4) Save your scanned document as an image, preferably a TIFF file (very large, but
maintains highest quality graphics.

Creating PDF: TIFF to PDF using Acrobat 6 Professional


1) from the FILE menu, choose CREATE PDF and FROM FILE.
2) Navigate to your TIFF file created from the scanner and choose OPEN.
3) Save your new PDF document and edit for accessibility.

Creating PDF: Directly using Acrobat 6 Professional


1) Place your document on the scanner bed. Be sure it is as straight as possible.
2) Open Adobe Acrobat.
3) From the FILE menu, choose CREATE PDF and FROM SCANNER. The "Create PDF from
Scanner" dialog box appears.

4) Make sure your scanner is selected and slide the quality selector toward the "Higher
Quality" end.
5) Click SCAN.

PDF From Scanning: March 2005

Page 4 of 8

6) Your scanning software may open and allow you to change the scan settings, or this may
be automated.
7) Save your PDF document and edit for accessibility.

Editing Your PDF for Accessibility


Steps for An Accessible Scanned PDF Document
In order to insure your document is accessible to users of assistive technology, you'll need to edit
the PDF document:
1) Perform OCR (Optical Character Recognition) on the document to make text selectable
and searchable. Repair any problems found during OCR conversion.
2) Add descriptive tags for non-text elements: graphs, charts, images.
3) Add accessible mark-up for tables.
4) Verify the read-order of the document.

Performing OCR Using Acrobat 6 Professional.


1) Open your scanned PDF document.
2) From the DOCUMENT menu, choose PAPER CAPTURE and START CAPTURE.
3) Acrobat interpolates your document and tries to produce readable text.
4) Save your document.
OCR Tip: After performing OCR, switch to Select Text mode and try to select text
in your document. The text that is highlighted has been interpreted by Acrobat.
Any text that cannot be highlighted failed to be converted. Also, notice if text is
highlighted in an odd order or if some blocks of text are skipped. This indicates
problems with read order.

Cleaning Up Acrobat OCR.


As Acrobat performs its OCR process, it creates a list of "suspect" words and characters that
could not be clearly identified. You can see all the suspect items at once: from the DOCUMENT
menu, choose PAPER CAPTURE and FIND ALL OCR SUSPECTS. Acrobat highlights all the suspect
items in the document.
You must address each OCR suspect. Any OCR suspect that you ignore will not be converted
into readable text and will be ignored by screen readers.
You can walk through the OCR suspects one by one:
1) from the DOCUMENT menu, choose PAPER CAPTURE, and FIND FIRST OCR SUSPECT.
2) The FIND ELEMENT dialog box appears showing the first "suspect" set of characters.

PDF From Scanning: March 2005

Page 5 of 8

If the suspect characters are text, you'll be able to edit them in the dialog box. Otherwise,
retype the correct text characters directly in the document using advanced editing
techniques in Acrobat.
3) Once you have corrected the suspect, choose "Accept and Find" to go to the next
suspect item.
4) When you have corrected all suspect items, save your document.

Verifying Your Document Text


After you have performed OCR and addressed all the suspect characters, you can do a quick
check to insure that the text of your document is available to screen readers: Save as text
(accessible).
1) From the FILE menu, choose "SAVE AS"
2) In the "SAVE AS" dialog box, change the "SAVE AS TYPE" to
"Text(Accessible)(*.txt)"

3) Click SAVE. Adobe converts your document to a plain text file using the same text that
would be accessible to assistive technology, including alternative text for images and
graphics.
4) Open your newly saved text version in Adobe. Compare the text in the plain text version
to the text in the PDF version are they the same? If not, edit the text and/or edit the tags
in the PDF version and re-save as "Text(Accessible)" to check again.

PDF From Scanning: March 2005

Page 6 of 8

Adding Tags to Your Document


Once you are certain that the necessary text is available on the document, you can add tags to
your document. Adding tags creates a duplicate of your document that is marked-up for
accessibility. Only the very latest assistive technology can read an untagged PDF. Plus,
untagged PDF cannot be reflowed to fit available screen size and cannot contain additional
information, such as alternative text for images. Thus, only a tagged PDF can be considered
accessible.
You can use Acrobat's automated feature to add tags to your document:
1) From the ADVANCED menu, choose ACCESSIBILITY and "ADD TAGS TO DOCUMENT".
2) Acrobat generates a tagged version of your document that can only be viewed in the tags
window. To open the tags window:
a. From the VIEW menu, choose NAVIGATION TABS and TAGS.
b. Use the asterisk (*) key on the Number Key Pad to open all tag levels.
c. Use the minus (-) key on the Number Key Pad to close all tag levels.
3) Check tags for accuracy, completeness, and read-order.

Checking Your Document for Accessibility


After adding tags, you can do a few quick-checks to insure your document will work well with
assistive technology. You can also use these techniques at any point in your conversion process
to check the accessibility of your document.
Highlight content
Highlighting content is a simple method to confirm:
1) Text is readable by a screen reader. Text that cannot be highlighted/selected is likely to
be skipped or ignored by screen readers. Perform another OCR and confirm that
deselected text is not a "suspect character".
2) Read-order of the document. The order that text is highlighted/selected is also the order
the text will be read by a screen reader. Pay particular attention to text in tables or
columns. Does the text in one cell bleed into the text in another? Can you select all of
one column and then all of the next? Read-order can be changed by rearranging the tags.
Reflow
Document reflow assists users who enlarge the text or who are using small screens or
resolutions, by reformatting the document to fit in the available screen. Without reflow, users
may be forced to scroll widely horizontal as well as vertically.
To check for reflow:
1) Increase the text size to 300% or greater.
2) From the VIEW menu, choose REFLOW

PDF From Scanning: March 2005

Page 7 of 8

3) Note that how a document reflows also depends on read-order.


Read Aloud
The best way to check a document's accessibility is to use the same assistive technology your
users will use to access the document. However, if you don't have access to a screen reader or
screen enlarger, you can still get a sense of how those technologies will interpret your document
by listening to it being read by Acrobat's "Read Out Loud" feature. Although not practical for
lengthy documents, such as dissertation chapters or articles, this is a good strategy for shorter
documents that will receive high circulation on your web site or will be required reading for your
users.
To read out loud:
1) From the VIEW menu, choose READ OUT LOUD
a. Press SHIFT + CNTRL + V to quickly read the current page
b. Press SHIFT + CNTRL + B to read the entire document
2) To stop reading: go to the VIEW menu, READ OUT LOUD, and choose STOP or press SHIFT
+ CNTRL + E.
For longer documents, you may want to narrow your reading to only a few key pages: in
particular, those pages that contain graphics, tables, columns, or text boxes.
Editing Tags
Any problems you find during your checks will most likely need to be addressed by editing the
tagged version of your pdf. For detailed guidance on how to edit tags and markup images,
tables, and links for accessibility, see the WAC Handout: "Checking Your PDF for
Accessibility". It is available online at: www.wac.ohio-state.edu/pdf/checking.

Resources
Guides and Tutorials
The WAC has put together an extensive collection of guides and resources on various production
methods for accessible PDF. Visit us online at: www.wac.ohio-state.edu/pdf.
Adobe offers a number of excellent resources as well. One we recommend: Acrobat for
Educators which includes a selection of FREE online video tutorials that guide you through
how to use Acrobat from simple Bookmarks and Articles to advanced Document Collections.
Check it out at: www.adobe.com/education/acrobat/acrobat_training.html.
Want more? Check out the discussions, tips, and tools offered by Planet PDF, a community of
advanced developers ready to help you with quick solutions to your PDF problems. Includes a
very useful collection of software titles for all types of PDF creation and conversion. Online at:
www.planetpdf.com.

PDF From Scanning: March 2005

Page 8 of 8

Software
A number of companies offer software that specializes in converting PDF to either accessible
(selectable & searchable) PDF or to other, more accessible, formats (Word, Excel, etc.). Here
are a few:
ABBYY PDF Transformer ($49.99): Quickly and accurately convert any PDF file into
Microsoft Word, Excel or HTML files without retyping and reformatting. PDF Transformer is
an ideal utility for business and home users that need to edit and repurpose a wide variety of PDF
files. [http://www.abbyyusa.com/pdftransformer.htm]
Able2Extract Professional ($120): Convert your PDF data into fully formatted Excel
spreadsheets and editable Word documents with Able2Extract Professional. Supports scanned
documents, offering 10 different conversion options in total.
[http://www.investintech.com/prod_a2e_pro.htm]
Adobe Acrobat Capture ($195): Adobe Acrobat Capture 3.0 software is the perfect addition
to Adobe Acrobat 7.0 for people who want to process high volumes of scanned paper and turn
them into searchable tagged Adobe PDF files.
[http://store.adobe.com/enterprise/accessibility/acrobatcapture30.html]
ISICopy ($99): ISICopy works with Adobe Acrobat software to extract text from imagebased PDF files, converting it into valuable editable text. There is no need to OCR an entire
page; if you have a paper-based PDF file, you can select the precise amount of text you want to
copy and then paste it into any application.
ScanSoft OmniPage Pro ($120): Quickly turn paper and PDF files into editable electronic
documents that look just like the original complete with text, tables and graphics. Robust new
tools enable you to turn text documents into audio books and add digital signatures to your
electronic documents. [http://www.scansoft.com]
SolidConverter ($50): You do NOT need Adobe Acrobat or Reader to use our converter!
Solid Converter PDF can be used as a standalone converter tool or as a plugin for Microsoft
Word and Adobe Acrobat (not Reader). Solid Converter PDF is also available through
Explorer's right click local menu. A command line interface is available for batch processing.
[http://www.solidpdf.com/]
For large jobs:
PrimeOCR ($1500, limited # of pages): includes an "Accessible PDF" module that meets
Section 508 guidelines for an accessible document.
[http://primerecognition.com/augprime/prime_ocr.htm]
Note: prices are offered for reference only (subject to change) and do not include
any educator's or volume-licensing discounts, if applicable. Before ordering
Adobe products, find out if it is available through our OSU volume-license
agreement with SHI. See "Adobe Ordering Procedures" on the OIT Site Licensed
Software page (available to OSU faculty and staff only):
[https://cweb1.net.ohio-state.edu/software/lookup.cgi?adobeclp&1.0&win&Adobeor
der.pdf]