Paper
21 December 2000 Automated labeling in document images
Author Affiliations +
Proceedings Volume 4307, Document Recognition and Retrieval VIII; (2000) https://doi.org/10.1117/12.410828
Event: Photonics West 2001 - Electronic Imaging, 2001, San Jose, CA, United States
Abstract
The National Library of Medicine (NLM) is developing an automated system to produce bibliographic records for its MEDLINER database. This system, named Medical Article Record System (MARS), employs document image analysis and understanding techniques and optical character recognition (OCR). This paper describes a key module in MARS called the Automated Labeling (AL) module, which labels all zones of interest (title, author, affiliation, and abstract) automatically. The AL algorithm is based on 120 rules that are derived from an analysis of journal page layouts and features extracted from OCR output. Experiments carried out on more than 11,000 articles in over 1,000 biomedical journals show the accuracy of this rule-based algorithm to exceed 96%.
© (2000) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Jongwoo Kim, Daniel X. Le, and George R. Thoma "Automated labeling in document images", Proc. SPIE 4307, Document Recognition and Retrieval VIII, (21 December 2000); https://doi.org/10.1117/12.410828
Lens.org Logo
CITATIONS
Cited by 32 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Optical character recognition

Databases

Mars

Detection and tracking algorithms

Feature extraction

Medicine

Algorithm development

RELATED CONTENT


Back to Top