Paper
15 October 2012 OCR enhancement through neighbor embedding and fast approximate nearest neighbors
Author Affiliations +
Abstract
Generic optical character recognition (OCR) engines often perform very poorly in transcribing scanned low resolution (LR) text documents. To improve OCR performance, we apply the Neighbor Embedding (NE) single-image super-resolution (SISR) technique to LR scanned text documents to obtain high resolution (HR) versions, which we subsequently process with OCR. For comparison, we repeat this procedure using bicubic interpolation (BI). We demonstrate that mean-square errors (MSE) in NE HR estimates do not increase substantially when NE is trained in one Latin font style and tested in another, provided both styles belong to the same font category (serif or sans serif). This is very important in practice, since for each font size, the number of training sets required for each category may be reduced from dozens to just one. We also incorporate randomized k-d trees into our NE implementation to perform approximate nearest neighbor search, and obtain a 1000x speed up of our original NE implementation, with negligible MSE degradation. This acceleration also made it practical to combine all of our size-specific NE Latin models into a single Universal Latin Model (ULM). The ULM eliminates the need to determine the unknown font category and size of an input LR text document and match it to an appropriate model, a very challenging task, since the dpi (pixels per inch) of the input LR image is generally unknown. Our experiments show that OCR character error rates (CER) were over 90% when we applied the Tesseract OCR engine to LR text documents (scanned at 75 dpi and 100 dpi) in the 6-10 pt range. By contrast, using k-d trees and the ULM, CER after NE preprocessing averaged less than 7% at 3x (100 dpi LR scanning) and 4x (75 dpi LR scanning) magnification, over an order of magnitude improvement. Moreover, CER after NE preprocessing was more that 6 times lower on average than after BI preprocessing.
© (2012) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
D. C Smith "OCR enhancement through neighbor embedding and fast approximate nearest neighbors", Proc. SPIE 8499, Applications of Digital Image Processing XXXV, 84991I (15 October 2012); https://doi.org/10.1117/12.928865
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Lawrencium

Optical character recognition

Bismuth

Error analysis

Image analysis

Image processing

Image enhancement

Back to Top