Learning to Recognize Objects and Scenes with Bags of Features and Spatial Pyramids
Svetlana Lazebnik
University of Illinois at Urbana-Champaign
Monday, April 2, 11:00AM
Babbio Center, Room 202
Stevens Institute of Technology
Abstract
The most basic operation in building a bag-of-features model is quantizing the local features, so that their distribution can be represented as a histogram of discrete "visual codewords." I will introduce an information-theoretic approach to learning visual codebooks by minimizing the loss of discriminative information incurred when a continuous high-dimensional feature vector is mapped to a discrete codeword index. I will present experiments demonstrating the advantage of these codebooks for image classification, as well as an application of the same information-theoretic framework to image segmentation.
In the second part of the talk, I will describe an extension of a bag of features into a spatial pyramid, or a collection of feature histograms computed at different levels of a hierarchical spatial decomposition of an image. The resulting method is simple and efficient, and it achieves state-of-the-art performance on difficult object and scene recognition tasks. It has already been adopted as a baseline for datasets containing hundreds of object categories, and has given rise to a winning recognition system in the international PASCAL Visual Object Classes Challenge.