Paper
28 January 2008 A mixed approach to auto-detection of page body
Liangcai Gao, Zhi Tang, Ruiheng Qiu
Author Affiliations +
Proceedings Volume 6815, Document Recognition and Retrieval XV; 68150T (2008) https://doi.org/10.1117/12.765886
Event: Electronic Imaging, 2008, San Jose, California, United States
Abstract
Page body holds the central information of a page in most documents. This paper addresses the problem of automatically detecting page body area in digital books or journals. A novel method based on font expansion and header and footer elimination is detailed. This method extracts body text font (BFont) and headers and footers from a document first, and then draws two page body bounding boxes for each page, one by analyzing the distribution of BFont in pages and the other by removing headers and footers from pages. Finally, the two bounding boxes are combined to obtain the resultant page body bounding box. The test results demonstrate very high recognition rate: up to 99.49% in precision.
© (2008) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Liangcai Gao, Zhi Tang, and Ruiheng Qiu "A mixed approach to auto-detection of page body", Proc. SPIE 6815, Document Recognition and Retrieval XV, 68150T (28 January 2008); https://doi.org/10.1117/12.765886
Lens.org Logo
CITATIONS
Cited by 4 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Image processing

Associative arrays

Computer science

Computing systems

Electronic imaging

Error analysis

Excel

RELATED CONTENT


Back to Top