Highly accurate retrieval method of Japanese document images through a combination of morphological analysis and OCR

Yutaka Katsuyama; Hiroaki Takebe; Koji Kurokawa; Takahiro Saitoh; Satoshi Naoi

doi:10.1117/12.450739

18 December 2001 Highly accurate retrieval method of Japanese document images through a combination of morphological analysis and OCR

Yutaka Katsuyama, Hiroaki Takebe, Koji Kurokawa, Takahiro Saitoh, Satoshi Naoi

Author Affiliations +

Proceedings Volume 4670, Document Recognition and Retrieval IX; (2001) https://doi.org/10.1117/12.450739
Event: Electronic Imaging, 2002, San Jose, California, United States

Abstract

We have developed a method that allows Japanese document images to be retrieved more accurately by using OCR character candidate information and a conventional plain text search engine. In this method, the document image is first recognized by normal OCR to produce text. Keyword areas are then estimated from the normal OCR produced text through morphological analysis. A lattice of candidate- character codes is extracted from these areas, and then character strings are extracted from the lattice using a word-matching method in noun areas and a K-th DP-matching method in undefined word areas. Finally, these extracted character strings are added to the normal OCR produced text to improve document retrieval accuracy when u sing a conventional plain text search engine. Experimental results from searches of 49 OHP sheet images revealed that our method has a high recall rate of 98.2%, compared to 90.3% with a conventional method using only normal OCR produced text, while requiring about the same processing time as normal OCR.

Citation Download Citation

Yutaka Katsuyama, Hiroaki Takebe, Koji Kurokawa, Takahiro Saitoh, and Satoshi Naoi "Highly accurate retrieval method of Japanese document images through a combination of morphological analysis and OCR", Proc. SPIE 4670, Document Recognition and Retrieval IX, (18 December 2001); https://doi.org/10.1117/12.450739

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available