Learning to Recognize Objects and Scenes with Bags of Features and Spatial Pyramids

Svetlana Lazebnik
University of Illinois at Urbana-Champaign

Monday, April 2, 11:00AM
Babbio Center, Room 202
Stevens Institute of Technology
 

Abstract


Bag-of-features models, which represent images by distributions of salient local features contained in them, are among the most robust and powerful image descriptions currently used for object and scene recognition. In this talk, I will present fundamental techniques for learning effective bag-of-features models and their extensions by constructing discriminative visual codebooks and incorporating spatial relationships between local features.

The most basic operation in building a bag-of-features model is quantizing the local features, so that their distribution can be represented as a histogram of discrete "visual codewords." I will introduce an information-theoretic approach to learning visual codebooks by minimizing the loss of discriminative information incurred when a continuous high-dimensional feature vector is mapped to a discrete codeword index. I will present experiments demonstrating the advantage of these codebooks for image classification, as well as an application of the same information-theoretic framework to image segmentation.

In the second part of the talk, I will describe an extension of a bag of features into a spatial pyramid, or a collection of feature histograms computed at different levels of a hierarchical spatial decomposition of an image. The resulting method is simple and efficient, and it achieves state-of-the-art performance on difficult object and scene recognition tasks. It has already been adopted as a baseline for datasets containing hundreds of object categories, and has given rise to a winning recognition system in the international PASCAL Visual Object Classes Challenge.