Paper
19 January 2009 Multi-font printed Mongolian document recognition system
Author Affiliations +
Proceedings Volume 7247, Document Recognition and Retrieval XVI; 72470J (2009) https://doi.org/10.1117/12.805864
Event: IS&T/SPIE Electronic Imaging, 2009, San Jose, California, United States
Abstract
Mongolian is one of the major ethnic languages in China. Large amount of Mongolian printed documents need to be digitized in digital library and various applications. Traditional Mongolian script has unique writing style and multi-font-type variations, which bring challenges to Mongolian OCR research. As traditional Mongolian script has some characteristics, for example, one character may be part of another character, we define the character set for recognition according to the segmented components, and the components are combined into characters by rule-based post-processing module. For character recognition, a method based on visual directional feature and multi-level classifiers is presented. For character segmentation, a scheme is used to find the segmentation point by analyzing the properties of projection and connected components. As Mongolian has different font-types which are categorized into two major groups, the parameter of segmentation is adjusted for each group. A font-type classification method for the two font-type group is introduced. For recognition of Mongolian text mixed with Chinese and English, language identification and relevant character recognition kernels are integrated. Experiments show that the presented methods are effective. The text recognition rate is 96.9% on the test samples from practical documents with multi-font-types and mixed scripts.
© (2009) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Liangrui Peng, Changsong Liu, Xiaoqing Ding, Hua Wang, and Jianming Jin "Multi-font printed Mongolian document recognition system", Proc. SPIE 7247, Document Recognition and Retrieval XVI, 72470J (19 January 2009); https://doi.org/10.1117/12.805864
Lens.org Logo
CITATIONS
Cited by 2 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Optical character recognition

Image segmentation

Visualization

Error analysis

Image quality

Digital libraries

Intelligence systems

Back to Top