Sie sind auf Seite 1von 2

Recognition of Nutrition Facts Labels from Mobile

Images
Olivia Grubert

Linyi Gao

April 25, 2014


We propose to build an Android app to recognize nutrition facts labels on processed foods
and store the information in a database. This application will enable users to record a log
of their daily diet.
The project will consist of two main components: image preprocessing and word recog-
nition. We will attack these components in several ways and compare our performance of
these dierent methods.
1 Image Preprocessing
One big challenge in this project is to acquire clear images on which to run our word recogni-
tion algorithm. Though OCR has been extremely successful in reading scanned documents,
interpreting text from mobile images is a much more dicult problem due to issues of skewed
perspective, unequal lighting, or overall poor image quality. We will investigate some of the
following preprocessing techniques in this project:
Locally adaptive thresholding and binarization
Median ltering for noise removal
Recognition of the bounding box for the nutrition facts
Correction for skew and perspective distortion
Word segmentation via the Hough transform for line detection, taking advantage of
the unique format of nutrition facts labels
2 Word Recognition
We will also compare the performance of dierent word recognition algorithms as part of
this project. The most straightforward method of word recognition may be to run our
preprocessed image through Tesseract to identify dierent line items in the Nutrition Facts

Email: ogrubert@stanford.edu.

Email: linyigao@stanford.edu.
1
table. However, since our bag of expected words is quite limited, we are interested in
comparing these results with our own template matching or multi-class SVM classication
algorithms. We believe we may be able to perform better word matching on our line items
than the direct output of an OCR system.
2.1 Preliminary OCR Results
To test the feasibility of using OCR, we took several images of a nutrition facts label on
a mobile phone and passed these images directly to Abbyy Finereader 11. These images
had various resolutions and levels of perspective distortion. For the highest-resolution image
without distortion, OCR was able to correctly identify the table layout and recognize the
text without any preprocessing. However, as perspective distortion was increased, the ability
of OCR to detect the table layout deteriorated. Likewise, character recognition accuracy
decreased with image resolution. These preliminary results indicate that images taken from
a mobile phone will require signicant preprocessing to accurately convert information from
a nutrition facts image into an electronic database.
References
[1] W. Bieniecki, S. Grabowski, and W. Rosenberg, Image Preprocessing for Improving
OCR Accuracy, in Proceedings of the International Conference on Perspective Tech-
nologies in MEMS Design, 2007, pp. 75-80.
[2] R. Smith, An Overview of the Tesseract OCR Engine, in Proceedings of the Ninth
International Conference on Document Analysis and Recognition, vol. 2, 2007, pp. 629-
633.
[3] H. Jiang, C. Han, K. Fan, A Fast Approach to Detect and Correct Skew Documents,
in Proceedings of the 13th International Conference on Pattern Recognition, vol. 3,
1996, pp. 742-746.
2

Das könnte Ihnen auch gefallen