We propose a novel scene categorization method based on multiscale category-specific visual words. The novelty of the proposed method lies in two aspects: (1) visual words are quantized in a multiscale manner that combines the global-feature-based and local-feature-based scene categorization approaches into a uniform framework; (2) unlike traditional visual word creation methods, which quantize visual words from the entire set of training, we form visual words from the training images grouped in different categories and then collate visual words from different categories to form the final codebook. This generation strategy is capable of enhancing the discriminative ability of the visual words, which is useful for achieving better classification performance. The proposed method is evaluated over two scene classification data sets with 8 and 13 scene categories, respectively. The experimental results show that the classification performance is significantly improved by using the multiscale category-specific visual words over that achieved by using the traditional visual words. Moreover, the proposed method is comparable with the best methods reported in previous literature in terms of classification accuracy rate (88.81% and 85.05% accuracy rates for data sets 1 and 2, respectively) and has the advantage in simplicity.
In this paper, we propose a scene categorization method based on multi-scale category-specific visual words. The
proposed method quantizes visual words in a multi-scale manner which combines the global-feature-based and local-feature-
based scene categorization approaches into a uniform framework. Unlike traditional visual word creation
methods which quantize visual words from the whole training images without considering their categories, we form
visual words from the training images grouped in different categories then collate the visual words from different
categories to form the final codebook. This category-specific strategy provides us with more discriminative visual words
for scene categorization. Based on the codebook, we compile a feature vector that encodes the presence of different
visual words to represent a given image. A SVM classifier with linear kernel is then employed to select the features and
classify the images. The proposed method is evaluated over two scene classification datasets of 6,447 images altogether
using 10-fold cross-validation. The results show that the classification accuracy has been improved significantly
comparing with the methods using the traditional visual words. And the proposed method is comparable to the best
results published in the previous literatures in terms of classification accuracy rate and has the advantage in terms of
simplicity.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.