Paper
18 December 2001 Automated data entry system: performance issues
George R. Thoma, Glenn Ford
Author Affiliations +
Proceedings Volume 4670, Document Recognition and Retrieval IX; (2001) https://doi.org/10.1117/12.450734
Event: Electronic Imaging, 2002, San Jose, California, United States
Abstract
This paper discusses the performance of a system for extracting bibliographic fields from scanned pages in biomedical journals to populate MEDLINE, the flagship database of the national Library of Medicine (NLM), and heavily used worldwide. This system consists of automated processes to extract the article title, author names, affiliations and abstract, and manual workstations for the entry of other required fields such as pagination, grant support information, databank accession numbers and others needed for a completed bibliographic record in MEDLINE. Labor and time data are given for (1) a wholly manual keyboarding process to create the records, (2) an OCR-based system that requires all fields except the abstract to be manually input, and (3) a more automated system that relies on document image analysis and understanding techniques for the extraction of several fields. It is shown that this last, most automated, approach requires less than 25% of the labor effort in the first, manual, process.
© (2001) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
George R. Thoma and Glenn Ford "Automated data entry system: performance issues", Proc. SPIE 4670, Document Recognition and Retrieval IX, (18 December 2001); https://doi.org/10.1117/12.450734
Lens.org Logo
CITATIONS
Cited by 9 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Mars

Databases

Optical character recognition

Image processing

Image analysis

Surgery

Biomedical optics

RELATED CONTENT

Correcting OCR text by association with historical datasets
Proceedings of SPIE (January 13 2003)
Automated zone correction in bitmapped document images
Proceedings of SPIE (December 22 1999)
Study of style effects on OCR errors in the MEDLINE...
Proceedings of SPIE (January 17 2005)
Automated labeling in document images
Proceedings of SPIE (December 21 2000)

Back to Top