Paper
4 February 2013 Automated recognition and extraction of tabular fields for the indexing of census records
Robert Clawson, Kevin Bauer, Glen Chidester, Milan Pohontsch, Douglas Kennard, Jongha Ryu, William Barrett
Author Affiliations +
Proceedings Volume 8658, Document Recognition and Retrieval XX; 86580J (2013) https://doi.org/10.1117/12.2004788
Event: IS&T/SPIE Electronic Imaging, 2013, Burlingame, California, United States
Abstract
We describe a system for indexing of census records in tabular documents with the goal of recognizing the content of each cell, including both headers and handwritten entries. Each document is automatically rectified, registered and scaled to a known template following which lines and fields are detected and delimited as cells in a tabular form. Whole-word or whole-phrase recognition of noisy machine-printed text is performed using a glyph library, providing greatly increased efficiency and accuracy (approaching 100%), while avoiding the problems inherent with traditional OCR approaches. Constrained handwriting recognition results for a single author reach as high as 98% and 94.5% for the Gender field and Birthplace respectively. Multi-author accuracy (currently 82%) can be improved through an increased training set. Active integration of user feedback in the system will accelerate the indexing of records while providing a tightly coupled learning mechanism for system improvement.
© (2013) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Robert Clawson, Kevin Bauer, Glen Chidester, Milan Pohontsch, Douglas Kennard, Jongha Ryu, and William Barrett "Automated recognition and extraction of tabular fields for the indexing of census records", Proc. SPIE 8658, Document Recognition and Retrieval XX, 86580J (4 February 2013); https://doi.org/10.1117/12.2004788
Lens.org Logo
CITATIONS
Cited by 5 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Optical character recognition

Image enhancement

Image filtering

Image registration

Image processing

Digital filtering

Fourier transforms

Back to Top