Low Rank Matrix Factorization and Relative Pose Problems in Computer Vision

Sammanfattning: Popular Abstract in English The ultimate goal of computer vision is to make computers "see" like humans do. Toward this goal, one essential step is to enable computers to perceive a three-dimensional (3D) space as in the real world. In this thesis, we investigate the problem of reconstructing a 3D scene model from ordinary two-dimensional (2D) images. More specifically, given a set of images of the same scene from varied perspectives, we are interested in developing a computer program that can automatically build a 3D model of the scene, usually in the form of a 3D point cloud that describes the geometry of the scene. In addition, the program should determine the pose of each camera, that is, where each camera is located relative to the scene and to which direction each camera points. Provided one can solve this problem, there is a wide range of possible applications, for example, to build a 3D map of a city or to reconstruct a 3D model of an object, which in turn can be fed into a 3D printer. The first part of the thesis contributes to a family of methods that can simultaneously find the 3D model of the scene as well as the camera poses. We start with feature matching, that is, to find the 2D image points which are the projections of the same 3D point in different images. All the 2D points that correspond to a certain scene point form a so-called point track. After collecting all the point tracks and putting them in a matrix, it is well-known that it is possible to retrieve the camera poses and the 3D scene points by decomposing this matrix into two smaller-sized matrices. This decomposition of the matrix into two smaller ones is often referred to as a low-rank factorization problem. We contribute to the factorization problem by developing a new method that is capable of handling missing elements in the matrix, which corresponds to missing or occluded point tracks. For example, consider a set of cameras that surrounds an object. Points on the back side of the object are invisible to the cameras in the front. The proposed method is also very robust in presence of outliers, which means it can still correctly recover the 3D scene points and the camera poses when some of the point tracks are wrong due to inaccurate feature matching. In the second part of the thesis, we focus on the problem of estimating the camera pose. We are especially interested in the relative pose problem of two views, that is, given two images, estimate how one camera rotates and translates relative to the other camera. Solving the two view problem is fundamental and essential for building a large-scale reconstruction system with more cameras. For some cameras with wide-angle lenses like GoPro series, which allows more of the scene to be included in the photograph, or some cameras with fisheye lenses, which are very common for surveillance cameras, images are distorted in the sense that the projection of a straight line in a 3D scene is no longer straight in the 2D image. This distortion effect, known as radial distortion, is more significant in the border of an image than in the centre of the image. If the distortion is not appropriately modeled, the reconstructed 3D model will look skewed. We explicitly model this effect of radial distortion when estimating the relative pose between two cameras and propose an efficient way of solving the problem. Beyond that, we also present a so-called brute-force algorithm to solve the relative pose problem. It works by systematically enumerating all possible candidates for the solution. By doing that, it is guaranteed to always find the optimal solution. The algorithm is efficient and can run on a graphical card using a parallelized version that simultaneously evaluate different candidate solutions. It is also robust to outliers and can be easily adapted to restricted camera motions, for example when the cameras move within a plane.