Paper
18 December 2001 Highly accurate retrieval method of Japanese document images through a combination of morphological analysis and OCR
Yutaka Katsuyama, Hiroaki Takebe, Koji Kurokawa, Takahiro Saitoh, Satoshi Naoi
Author Affiliations +
Proceedings Volume 4670, Document Recognition and Retrieval IX; (2001) https://doi.org/10.1117/12.450739
Event: Electronic Imaging, 2002, San Jose, California, United States
Abstract
We have developed a method that allows Japanese document images to be retrieved more accurately by using OCR character candidate information and a conventional plain text search engine. In this method, the document image is first recognized by normal OCR to produce text. Keyword areas are then estimated from the normal OCR produced text through morphological analysis. A lattice of candidate- character codes is extracted from these areas, and then character strings are extracted from the lattice using a word-matching method in noun areas and a K-th DP-matching method in undefined word areas. Finally, these extracted character strings are added to the normal OCR produced text to improve document retrieval accuracy when u sing a conventional plain text search engine. Experimental results from searches of 49 OHP sheet images revealed that our method has a high recall rate of 98.2%, compared to 90.3% with a conventional method using only normal OCR produced text, while requiring about the same processing time as normal OCR.
© (2001) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Yutaka Katsuyama, Hiroaki Takebe, Koji Kurokawa, Takahiro Saitoh, and Satoshi Naoi "Highly accurate retrieval method of Japanese document images through a combination of morphological analysis and OCR", Proc. SPIE 4670, Document Recognition and Retrieval IX, (18 December 2001); https://doi.org/10.1117/12.450739
Lens.org Logo
CITATIONS
Cited by 5 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Optical character recognition

Calcium

Morphological analysis

Binary data

Image segmentation

Error analysis

Image retrieval

RELATED CONTENT

Printed Arabic optical character segmentation
Proceedings of SPIE (March 16 2015)
Peano key rediscovery for content-based retrieval of images
Proceedings of SPIE (October 06 1997)
Spotting words in handwritten Arabic documents
Proceedings of SPIE (January 16 2006)
Automatic information retrieval of Chinese business card
Proceedings of SPIE (January 13 2003)
Auto music score recognizing system
Proceedings of SPIE (April 14 1993)

Back to Top