Paper
16 January 2006 JBIG2 text image compression based on OCR
Author Affiliations +
Proceedings Volume 6067, Document Recognition and Retrieval XIII; 60670D (2006) https://doi.org/10.1117/12.641557
Event: Electronic Imaging 2006, 2006, San Jose, California, United States
Abstract
The JBIG2 (joint bi-level image group) standard for bi-level image coding is drafted to allow encoder designs by individuals. In JBIG2, text images are compressed by pattern matching techniques. In this paper, we propose a lossy text image compression method based on OCR (optical character recognition) which compresses bi-level images into the JBIG2 format. By processing text images with OCR, we can obtain recognition results of characters and the confidence of these results. A representative symbol image could be generated for similar character image blocks by OCR results, sizes of blocks and mismatches between blocks. This symbol image could replace all the similar image blocks and thus a high compression ratio could be achieved. Experiment results show that our algorithm achieves improvements of 75.86% over lossless SPM and 14.05% over lossy PM and S in Latin Character images, and 37.9% over lossless SPM and 4.97% over lossy PM and S in Chinese character images. Our algorithm leads to much fewer substitution errors than previous lossy PM and S and thus preserves acceptable decoded image quality.
© (2006) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Junqing Shang, Changsong Liu, and Xiaoqing Ding "JBIG2 text image compression based on OCR", Proc. SPIE 6067, Document Recognition and Retrieval XIII, 60670D (16 January 2006); https://doi.org/10.1117/12.641557
Lens.org Logo
CITATIONS
Cited by 4 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Image compression

Optical character recognition

Image segmentation

Scanning probe microscopy

Computer programming

Image quality

Algorithms

RELATED CONTENT


Back to Top