Paper
23 March 1994 Text characterization by connected component transformations
Larry Spitz
Author Affiliations +
Proceedings Volume 2181, Document Recognition; (1994) https://doi.org/10.1117/12.171097
Event: IS&T/SPIE 1994 International Symposium on Electronic Imaging: Science and Technology, 1994, San Jose, CA, United States
Abstract
Worldwide there are many different scripts and languages in common use. Finding text lines and character and word boundaries, where present, are necessary primitive operations for most document processing applications. We have developed a method of handling text lines from several different languages that is robust in the presence of common printing and scanning artifacts. A technique is described by which information about the characteristics of a text line can be determined from a list of the connected pixel components that comprise the image. This technique applies across many languages and scripts that are laid out horizontally. For text comprising Roman type, the location and dimensions of each text line are augmented with positions of the baseline and x-height. Where appropriate, coordinates of space-delimited words and individual character cells are determined. This technique incorporates a computationally inexpensive method for straightening curved lines and segmenting kerned characters and a novel method based on font weight and stress for locating the boundaries of individual characters, even if their images touch.
© (1994) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Larry Spitz "Text characterization by connected component transformations", Proc. SPIE 2181, Document Recognition, (23 March 1994); https://doi.org/10.1117/12.171097
Lens.org Logo
CITATIONS
Cited by 19 scholarly publications and 1 patent.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Image segmentation

Image processing

Raster graphics

Printing

Detection and tracking algorithms

Algorithm development

Image processing algorithms and systems

RELATED CONTENT

Non-Manhattan layout extraction algorithm
Proceedings of SPIE (March 21 2013)
Archiving of line-drawing images
Proceedings of SPIE (November 21 1995)
Automatic text recognition in digital videos
Proceedings of SPIE (March 13 1996)
Text segmentation for automatic document processing
Proceedings of SPIE (January 07 1999)
Machine-printed Arabic OCR
Proceedings of SPIE (February 25 1994)

Back to Top