Paper
15 December 2003 Word level script identification for scanned document images
Author Affiliations +
Proceedings Volume 5296, Document Recognition and Retrieval XI; (2003) https://doi.org/10.1117/12.530538
Event: Electronic Imaging 2004, 2004, San Jose, California, United States
Abstract
In this paper, we compare the performance of three classifiers used to identify the script of words in scanned document images. In both training and testing, a Gabor filter is applied and 16 channels of features are extracted. Three classifiers (Support Vector Machines (SVM), Gaussian Mixture Model (GMM) and k-Nearest-Neighbor (k-NN)) are used to identify different scripts at the word level (glyphs separated by white space). These three classifiers are applied to a variety of bilingual dictionaries and their performance is compared. Experimental results show the capability of Gabor filter to capture script features and the effectiveness of these three classifiers for script identification at the word level.
© (2003) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Huanfeng Ma and David Doermann "Word level script identification for scanned document images", Proc. SPIE 5296, Document Recognition and Retrieval XI, (15 December 2003); https://doi.org/10.1117/12.530538
Lens.org Logo
CITATIONS
Cited by 29 scholarly publications and 6 patents.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Associative arrays

Image segmentation

Feature extraction

Image quality

Gaussian filters

Image processing

Optical character recognition

Back to Top