The BBN Byblos Hindi OCR system

Premkumar S. Natarajan; Ehry MacRostie; Michael Decerbo

doi:10.1117/12.588810

17 January 2005 The BBN Byblos Hindi OCR system

Premkumar S. Natarajan, Ehry MacRostie, Michael Decerbo

Proceedings Volume 5676, Document Recognition and Retrieval XII; (2005) https://doi.org/10.1117/12.588810
Event: Electronic Imaging 2005, 2005, San Jose, California, United States

Abstract

The BBN Byblos OCR system implements a script-independent methodology for OCR using Hidden Markov Models (HMMs). We have successfully ported the system to Arabic, English, Chinese, Pashto, and Japanese. In this paper, we report on our recent effort in training the system to perform recognition of Hindi (Devanagari) documents. The initial experiments reported in this paper were performed using a corpus of synthetic (computer-generated) document images along with slightly degraded versions of the same that were generated by scanning printed versions of the document images and by scanning faxes of the printed versions. On a fair test set consisting of synthetic images alone we measured a character error rate of 1.0%. The character error rate on a fair test set consisting of scanned images (scans of printed versions of the synthetic images) was 1.40% while the character error rate on a fair test set of fax images (scans of printed and faxed versions of the synthetic images) was 8.7%.

Citation Download Citation

Premkumar S. Natarajan, Ehry MacRostie, and Michael Decerbo "The BBN Byblos Hindi OCR system", Proc. SPIE 5676, Document Recognition and Retrieval XII, (17 January 2005); https://doi.org/10.1117/12.588810

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available