A mixed approach to auto-detection of page body

Liangcai Gao; Zhi Tang; Ruiheng Qiu

doi:10.1117/12.765886

28 January 2008 A mixed approach to auto-detection of page body

Liangcai Gao, Zhi Tang, Ruiheng Qiu

Proceedings Volume 6815, Document Recognition and Retrieval XV; 68150T (2008) https://doi.org/10.1117/12.765886
Event: Electronic Imaging, 2008, San Jose, California, United States

Abstract

Page body holds the central information of a page in most documents. This paper addresses the problem of automatically detecting page body area in digital books or journals. A novel method based on font expansion and header and footer elimination is detailed. This method extracts body text font (BFont) and headers and footers from a document first, and then draws two page body bounding boxes for each page, one by analyzing the distribution of BFont in pages and the other by removing headers and footers from pages. Finally, the two bounding boxes are combined to obtain the resultant page body bounding box. The test results demonstrate very high recognition rate: up to 99.49% in precision.

Citation Download Citation

Liangcai Gao, Zhi Tang, and Ruiheng Qiu "A mixed approach to auto-detection of page body", Proc. SPIE 6815, Document Recognition and Retrieval XV, 68150T (28 January 2008); https://doi.org/10.1117/12.765886

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available