Paper
13 January 2003 General Chinese document capture system with improved error-rejecting module
Author Affiliations +
Proceedings Volume 5010, Document Recognition and Retrieval X; (2003) https://doi.org/10.1117/12.476037
Event: Electronic Imaging 2003, 2003, Santa Clara, CA, United States
Abstract
This paper introduces a newly designed general-purpose Chinese document data capture system - Tsinghua OCR Network Edition (TONE). The system aimed to cut down the high cost in the process of digitalizing mass Chinese paper documents. Our first step was to divide the whole data-entry process into a few single-purpose procedures. Then based on these procedures, a production-line-like system configuration was developed. By design, the management cost was reduced directly by substituting automated task scheduling for traditional manual assignment, and indirectly by adopting well-designed quality control mechanism. Classification distances, character image positions, and context grammars are synthesized to reject questionable characters. Experiments showed that when 19.91% of the characters are rejected, the residual error rate could be 0.0097% (below one per ten thousand characters). This finally improved the error-rejecting module to be applicable. According to the cost distribution (specially, the manual correction occupies 70% of total) in the data companies, the estimated total cost reduction could be over 50%.
© (2003) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Dahai Luan, Changsong Liu, and Xiaoqing Ding "General Chinese document capture system with improved error-rejecting module", Proc. SPIE 5010, Document Recognition and Retrieval X, (13 January 2003); https://doi.org/10.1117/12.476037
Lens.org Logo
CITATIONS
Cited by 1 scholarly publication.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Optical character recognition

Image processing

Error analysis

Image segmentation

Image restoration

Data analysis

Digital libraries

RELATED CONTENT

Multi-font printed Mongolian document recognition system
Proceedings of SPIE (January 19 2009)
Locally adaptive document skew detection
Proceedings of SPIE (April 03 1997)
Keyword spotting via word shape recognition
Proceedings of SPIE (March 30 1995)
Script identification of handwritten word images
Proceedings of SPIE (January 19 2009)

Back to Top