Scalable ranked retrieval using document images

Rajiv Jain; Douglas W. Oard; David Doermann

doi:10.1117/12.2038656

24 March 2014 Scalable ranked retrieval using document images

Rajiv Jain, Douglas W. Oard, David Doermann

Proceedings Volume 9021, Document Recognition and Retrieval XXI; 90210K (2014) https://doi.org/10.1117/12.2038656
Event: IS&T/SPIE Electronic Imaging, 2014, San Francisco, California, United States

Abstract

Despite the explosion of text on the Internet, hard copy documents that have been scanned as images still play a significant role for some tasks. The best method to perform ranked retrieval on a large corpus of document images, however, remains an open research question. The most common approach has been to perform text retrieval using terms generated by optical character recognition. This paper, by contrast, examines whether a scalable segmentation-free image retrieval algorithm, which matches sub-images containing text or graphical objects, can provide additional benefit in satisfying a user’s information needs on a large, real world dataset. Results on 7 million scanned pages from the CDIP v1.0 test collection show that content based image retrieval finds a substantial number of documents that text retrieval misses, and that when used as a basis for relevance feedback can yield improvements in retrieval effectiveness.

Citation Download Citation

Rajiv Jain, Douglas W. Oard, and David Doermann "Scalable ranked retrieval using document images", Proc. SPIE 9021, Document Recognition and Retrieval XXI, 90210K (24 March 2014); https://doi.org/10.1117/12.2038656

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available