Paper
1 April 1998 Compound character recognition by run-number-based metric distance
Uptal Garain, B. B. Chaudhuri
Author Affiliations +
Proceedings Volume 3305, Document Recognition V; (1998) https://doi.org/10.1117/12.304622
Event: Photonics West '98 Electronic Imaging, 1998, San Jose, CA, United States
Abstract
This paper concerns automatic OCR of Bangla, a major Indian Language Script which is the fourth most popular script in the world. A Bangla OCR system has to recognize about 300 graphemic shapes among which 250 compound characters have quite complex stroke patterns. For recognition of such compound characters, feature based approaches are less reliable and template based approaches are less flexible to size and style variation of character font. We combine the positive aspects of feature based and template based approaches. Here we propose a run number based normalized template matching technique for compound character recognition. Run number vectors for both horizontal and vertical scanning are computed. As the number of scans may very from pattern to pattern, we normalize and abbreviate the vector. We prove that this normalized and abbreviated vector induces metric distance metric distance. Moreover, this vector is invariant to scaling, insensitive to character style variation and more effective for more complex-shaped characters than simple-shaped ones. We use this vector representation for matching within a group of compound characters. We notice that the matching is more efficient if the vector is reorganized with respect to the centroid of the pattern. We have tested our approach on a large set of segmented compounds characters at different point sizes as well as different styles. Italic characters are subject to preprocessing. The overall correct recognition rate is 99.69 percent.
© (1998) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Uptal Garain and B. B. Chaudhuri "Compound character recognition by run-number-based metric distance", Proc. SPIE 3305, Document Recognition V, (1 April 1998); https://doi.org/10.1117/12.304622
Lens.org Logo
CITATIONS
Cited by 20 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Optical character recognition

Head

Printing

Computer vision technology

Feature extraction

Image segmentation

Indium

RELATED CONTENT

A new method for fast circle detection in a complex...
Proceedings of SPIE (December 02 2011)
Principal curve detection in complicated graph images
Proceedings of SPIE (September 20 2001)
Local window approach to detect line segment based on line...
Proceedings of SPIE (September 21 2001)
Table structure recognition and its evaluation
Proceedings of SPIE (December 21 2000)
System for Oriya handwritten numeral recognition
Proceedings of SPIE (December 15 2003)

Back to Top