Paper
30 March 1995 Extraction of text layout structures on document images based on statistical characterization
Su S. Chen, Robert M. Haralick, Ihsin T. Phillips
Author Affiliations +
Proceedings Volume 2422, Document Recognition II; (1995) https://doi.org/10.1117/12.205815
Event: IS&T/SPIE's Symposium on Electronic Imaging: Science and Technology, 1995, San Jose, CA, United States
Abstract
The textual structures like the characters, words, text lines, paragraphs on a document image are usually laid out in a very structured manner -- having preferred spatial relations. These spatial relations are rarely deterministic; instead, they describe correlations and likelihoods. Therefore, any realistic document layout analysis algorithm should utilize this type of knowledge in order to optimize its performances. In this paper, we first describe a method for automatically generating a large amount of almost 100% correct ground truth data for the document layout analysis. The bounding boxes for the characters, words, text lines, paragraphs are expressed in a hierarchy. Then based on these layout ground-truth, we build statistical models to model the layout structures for the words, text lines, paragraphs on document images. Finally, we described an algorithm that utilizes these statistical models to extract the text words on document images. The performance of the algorithm is evaluated and reported.
© (1995) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Su S. Chen, Robert M. Haralick, and Ihsin T. Phillips "Extraction of text layout structures on document images based on statistical characterization", Proc. SPIE 2422, Document Recognition II, (30 March 1995); https://doi.org/10.1117/12.205815
Lens.org Logo
CITATIONS
Cited by 3 scholarly publications and 2 patents.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Image segmentation

Image processing algorithms and systems

Algorithm development

Statistical analysis

Binary data

Image processing

Databases

Back to Top