Paper
23 March 1994 Expert system for automatically correcting OCR output
Author Affiliations +
Proceedings Volume 2181, Document Recognition; (1994) https://doi.org/10.1117/12.171114
Event: IS&T/SPIE 1994 International Symposium on Electronic Imaging: Science and Technology, 1994, San Jose, CA, United States
Abstract
This paper describes a new expert system for automatically correcting errors made by optical character recognition (OCR) devices. The system, which we call the post-processing system, is designed to improve the quality of text produced by an OCR device in preparation for subsequent retrieval from an information system. The system is composed of numerous parts: an information retrieval system, an English dictionary, a domain-specific dictionary, and a collection of algorithms and heuristics designed to correct as many OCR errors as possible. For the remaining errors that cannot be corrected, the system passes them on to a user-level editing program. This post-processing system can be viewed as part of a larger system that would streamline the steps of taking a document from its hard copy form to its usable electronic form, or it can be considered a stand alone system for OCR error correction. An earlier version of this system has been used to process approximately 10,000 pages of OCR generated text. Among the OCR errors discovered by this version, about 87% were corrected. We implement numerous new parts of the system, test this new version, and present the results.
© (1994) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Kazem Taghva, Julie Borsack, and Allen Condit "Expert system for automatically correcting OCR output", Proc. SPIE 2181, Document Recognition, (23 March 1994); https://doi.org/10.1117/12.171114
Lens.org Logo
CITATIONS
Cited by 35 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Optical character recognition

Associative arrays

Human-machine interfaces

Geology

Infrared imaging

Interfaces

Licensing

RELATED CONTENT

OCR correction based on document level knowledge
Proceedings of SPIE (January 13 2003)
Evaluating text categorization in the presence of OCR errors
Proceedings of SPIE (December 21 2000)
Authoring hypermedia training applications
Proceedings of SPIE (September 16 1998)
Effectiveness of thesauri-aided retrieval
Proceedings of SPIE (January 07 1999)
Do Thesauri enhance rule-based categorization for OCR text?
Proceedings of SPIE (January 13 2003)

Back to Top