Paper
30 September 2011 Symbolic document image compression based on pattern matching techniques
Chwan-Yi Shiah, Yun-Sheng Yen
Author Affiliations +
Proceedings Volume 8285, International Conference on Graphic and Image Processing (ICGIP 2011); 82851D (2011) https://doi.org/10.1117/12.913413
Event: 2011 International Conference on Graphic and Image Processing, 2011, Cairo, Egypt
Abstract
In this paper, a novel compression algorithm for Chinese document images is proposed. Initially, documents are segmented into readable components such as characters and punctuation marks. Similar patterns within the text are found by shape context matching and grouped to form a set of prototype symbols. Text redundancies can be removed by replacing repeated symbols by their corresponding prototype symbols. To keep the compression visually lossless, we use a multi-stage symbol clustering procedure to group similar symbols and to ensure that there is no visible error in the decompressed image. In the encoding phase, the resulting data streams are encoded by adaptive arithmetic coding. Our results show that the average compression ratio is better than the international standard JBIG2 and the compressed form of a document image is suitable for a content-based keyword searching operation.
© (2011) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Chwan-Yi Shiah and Yun-Sheng Yen "Symbolic document image compression based on pattern matching techniques", Proc. SPIE 8285, International Conference on Graphic and Image Processing (ICGIP 2011), 82851D (30 September 2011); https://doi.org/10.1117/12.913413
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Image compression

Prototyping

Distance measurement

Binary data

Computer programming

Visualization

Image segmentation

RELATED CONTENT

Watermarking in JPEG bitstream
Proceedings of SPIE (March 21 2005)
Dual-Mode Hybrid Compressor For Facsimile Images
Proceedings of SPIE (December 28 1979)
Multispectral image watermarking based on KLT
Proceedings of SPIE (September 26 2001)

Back to Top