Paper
22 December 1999 How to find mathematics on a scanned page
Richard J. Fateman
Author Affiliations +
Proceedings Volume 3967, Document Recognition and Retrieval VII; (1999) https://doi.org/10.1117/12.373482
Event: Electronic Imaging, 2000, San Jose, CA, United States
Abstract
We describe the design of document analysis procedures to separate mathematics from ordinary text on a scanned page of mixed material. It is easy to observe that the accuracy of commercial OCR programs is helped by separating mixed material into two (or more) streams, with conventional non-math text handled by the usual OCR text-based-heuristics analysis. The second stream, consisting of material judged to be mathematics, can be fed to a specialized recognizer. If that fails to decode it, it can be passed on to yet a third stream including diagrams, logos, or other miscellaneous material, perhaps including halftones. We explore the extent to which this separation can be automated in the context of scanning archival material for a digital library project including mathematical and scientific journal pages.
© (1999) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Richard J. Fateman "How to find mathematics on a scanned page", Proc. SPIE 3967, Document Recognition and Retrieval VII, (22 December 1999); https://doi.org/10.1117/12.373482
Lens.org Logo
CITATIONS
Cited by 18 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Mathematics

Optical character recognition

Computing systems

Digital libraries

Image processing

Mathematical modeling

Associative arrays

RELATED CONTENT

Intelligent word-based text recognition
Proceedings of SPIE (February 01 1991)
Applying SIMD to optical character recognition (OCR)
Proceedings of SPIE (May 06 2008)
Real-time classification of wooden boards
Proceedings of SPIE (February 01 1991)
Author name recognition in degraded journal images
Proceedings of SPIE (January 16 2006)
Graph-based table recognition system
Proceedings of SPIE (March 07 1996)
Heuristics for test recognition using contextual information
Proceedings of SPIE (January 31 1995)

Back to Top