Paper
17 January 2005 The BBN Byblos Hindi OCR system
Premkumar S. Natarajan, Ehry MacRostie, Michael Decerbo
Author Affiliations +
Proceedings Volume 5676, Document Recognition and Retrieval XII; (2005) https://doi.org/10.1117/12.588810
Event: Electronic Imaging 2005, 2005, San Jose, California, United States
Abstract
The BBN Byblos OCR system implements a script-independent methodology for OCR using Hidden Markov Models (HMMs). We have successfully ported the system to Arabic, English, Chinese, Pashto, and Japanese. In this paper, we report on our recent effort in training the system to perform recognition of Hindi (Devanagari) documents. The initial experiments reported in this paper were performed using a corpus of synthetic (computer-generated) document images along with slightly degraded versions of the same that were generated by scanning printed versions of the document images and by scanning faxes of the printed versions. On a fair test set consisting of synthetic images alone we measured a character error rate of 1.0%. The character error rate on a fair test set consisting of scanned images (scans of printed versions of the synthetic images) was 1.40% while the character error rate on a fair test set of fax images (scans of printed and faxed versions of the synthetic images) was 8.7%.
© (2005) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Premkumar S. Natarajan, Ehry MacRostie, and Michael Decerbo "The BBN Byblos Hindi OCR system", Proc. SPIE 5676, Document Recognition and Retrieval XII, (17 January 2005); https://doi.org/10.1117/12.588810
Lens.org Logo
CITATIONS
Cited by 7 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Data modeling

Optical character recognition

Performance modeling

Feature extraction

Systems modeling

Data processing

Detection and tracking algorithms

Back to Top