Paper
22 December 1999 Evaluation of decision forests on text categorization
Hao Chen, Tin Kam Ho
Author Affiliations +
Proceedings Volume 3967, Document Recognition and Retrieval VII; (1999) https://doi.org/10.1117/12.373494
Event: Electronic Imaging, 2000, San Jose, CA, United States
Abstract
Text categorization is useful for indexing documents for information retrieval, filtering parts for document understanding, and summarizing contents of documents of special interests. We describe a text categorization task and an experiment using documents from the Reuters and OHSUMED collections. We applied the Decision Forest classifier and compared its accuracies to those of C4.5 and kNN classifiers using both category dependent and category independent term selection schemes. It is found that Decision Forest outperforms both C4.5 and kNN in all cases, and that category dependent term selection yields better accuracies. Performances of al three classifiers degrade from the Reuters collection to the OHSUMED collection, but Decision Forest remains to be superior.
© (1999) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Hao Chen and Tin Kam Ho "Evaluation of decision forests on text categorization", Proc. SPIE 3967, Document Recognition and Retrieval VII, (22 December 1999); https://doi.org/10.1117/12.373494
Lens.org Logo
CITATIONS
Cited by 19 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Binary data

Feature extraction

Feature selection

Algorithm development

Data modeling

Distance measurement

Neural networks

Back to Top