Paper
3 April 1997 Fast title extraction method for business documents
Yutaka Katsuyama, Satoshi Naoi
Author Affiliations +
Proceedings Volume 3027, Document Recognition IV; (1997) https://doi.org/10.1117/12.270072
Event: Electronic Imaging '97, 1997, San Jose, CA, United States
Abstract
Conventional electronic document filing systems are inconvenient because the user must specify the keywords in each document for later searches. To solve this problem, automatic keyword extraction methods using natural language processing and character recognition have been developed. However, these methods are slow, especially for japanese documents. To develop a practical electronic document filing system, we focused on the extraction of keyword areas from a document by image processing. Our fast title extraction method can automatically extract titles as keywords from business documents. All character strings are evaluated for similarity by rating points associated with title similarity. We classified these points as four items: character sitting size, position of character strings, relative position among character strings, and string attribution. Finally, the character string that has the highest rating is selected as the title area. The character recognition process is carried out on the selected area. It is fast because this process must recognize a small number of patterns in the restricted area only, and not throughout the entire document. The mean performance of this method is an accuracy of about 91 percent and a 1.8 sec. processing time for an examination of 100 Japanese business documents.
© (1997) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Yutaka Katsuyama and Satoshi Naoi "Fast title extraction method for business documents", Proc. SPIE 3027, Document Recognition IV, (3 April 1997); https://doi.org/10.1117/12.270072
Lens.org Logo
CITATIONS
Cited by 1 scholarly publication.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Optical character recognition

Binary data

Data conversion

Image processing

Receivers

Associative arrays

Nickel

RELATED CONTENT

Addressee recognition for automated fax mail distribution
Proceedings of SPIE (March 07 1996)
Recognition of printed Arabic text using machine learning
Proceedings of SPIE (April 01 1998)
Very fast recognition of GIRO check forms
Proceedings of SPIE (April 14 1993)
Enhancement of document images from cameras
Proceedings of SPIE (April 01 1998)

Back to Top