Paper
21 December 2000 Table analysis for multiline cell identification
Author Affiliations +
Proceedings Volume 4307, Document Recognition and Retrieval VIII; (2000) https://doi.org/10.1117/12.410853
Event: Photonics West 2001 - Electronic Imaging, 2001, San Jose, CA, United States
Abstract
A table in a document is a rectilinear arrangement of cells where each cell contains a sequence of words. Several lines of text may compose one cell. Cells may be delimited by horizontal or vertical lines, but often this is not the case. A table analysis system is described which reconstructs table formatting information from table images whether or not the cells are explicitly delimited. Inputs to the system are word bounding boxes and any horizontal and vertical lines that delimit cells. Using a sequence of carefully-crafted rules, multi-line cells and their interrelationships are found even though no explicit delimiters are visible. This robust system is a component of a commercial document recognition system.
© (2000) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
John C. Handley "Table analysis for multiline cell identification", Proc. SPIE 4307, Document Recognition and Retrieval VIII, (21 December 2000); https://doi.org/10.1117/12.410853
Lens.org Logo
CITATIONS
Cited by 27 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Image segmentation

Databases

Switches

Optical character recognition

Systems modeling

Data modeling

Image processing

RELATED CONTENT


Back to Top