From Movies to 3D Models: the Structure-from-Motion Problem

Dr. John Oliensis
NEC Research Institute
Princeton, NJ

Thursday, September 5 at 2:00PM
Lieb 3rd floor Conference Room
 

Abstract


I describe some of my recent results on Structure from Motion (SFM). SFM is the most studied problem in Computer Vision and the most robust way to make inferences about the 3D world from images. The problem is: given a sequence of photographic images of a fixed 3D scene, taken by a moving camera, estimate 1) a 3D geometric model of the scene (structure), 2) the camera's motion, i.e., its position and orientation for each image. Solving SFM will enable users to create 3D models of their environment by waving a video camera at it.

Formally, SFM is an optimization problem: the goal is to find the best estimates of the 3D scene and motion that minimize an ``error function,'' where this measures the discrepancy between the images predicted on the basis of the estimates and the actual images. My recent results include:

1) Structure from Motion using two images is the most basic form of the problem and has great practical importance, since most current algorithms use a two-image technique in their initial stages. I describe a simple, exact expression for the two-image error function that depends only on the camera motion (not on the unknown structure). This leads to an exact algorithm that is much faster and more reliable than the current method.

2) For two images, I present an analytic model of the error function's dependence on the motion estimate. This gives essentially a complete understanding of the intrinsic problem that an SFM algorithm has to solve and also explains a phenomenon of human vision. By using this analysis to detect and avoid algorithms' mistakes, one can achieve more reliable algorithms. Previous analyses of the error function captured only a single aspect of its motion dependence.

3) To obtain accurate, detailed 3D models, one must accumulate information over many images. I describe the first SFM algorithms that compute estimates *directly* from *arbitrarily* many photographic images, without iterating, without relying on an initial guess at the unknowns, and without needing a prior computation of the correspondence between images (i.e., of which points in different images represent the same 3D points). This approach gives better results than the Sturm/Triggs algorithm on correspondences that have been pre-computed.