In this paper we describe a modified classification method destined for extractive summarization purpose. The classification
in this method doesn’t need a learning corpus; it uses the input text to do that. First, we cluster the document sentences to
exploit the diversity of topics, then we use a learning algorithm (here we used Naive Bayes) on each cluster considering
it as a class. After obtaining the classification model, we calculate the score of a sentence in each class, using a scoring
model derived from classification algorithm. These scores are used, then, to reorder the sentences and extract the first ones
as the output summary.
We conducted some experiments using a corpus of scientific papers, and we have compared our results to another summarization
system called UNIS.1 Also, we experiment the impact of clustering threshold tuning, on the resulted summary,
as well as the impact of adding more features to the classifier. We found that this method is interesting, and gives good
performance, and the addition of new features (which is simple using this method) can improve summary’s accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.