For my independent study project, I implemented a Structure from Motion (SfM) pipeline using an RGBD camera. The goal was to eventually get a dense 3D reconstruction of the environment. The SfM (or Visual SLAM) pipeline has 2 main parts:
The SfM method I implemented results in a sparse point cloud of map points. To then get a dense point cloud, I use the camera poses output from SfM and fuse the depth maps from each frame to get a dense point cloud.
The step in Structure from Motion is the estimation of the camera motion as part of the Tracking or Frontend step. Motion is estimated in a 7 step process:
A diagram of the tracking process
This tracking process provides us with an odometry estimate of the camera motion akin to using encoders. Consequently, this means it drifts over time, as illustrated by a complete run on the KITTI dataset zero below.
To rectify this drift and enhance the precision of the position estimate, we can implement Loop Closing, a component of the Backend in SLAM. Loop Closing involves detecting when the camera revisits a previously visited location.
Before we can loop close and optimize the map, we need to detect when the camera revisits a location. I built a place recognition module on top of a Visual Vocabulary and Bag of Words database (the DBoW2 library) that allows me to quickly query the database for similar images. From the query we get several similar images, to identify loops is then a 3 step process:
After we have identified a loop, the loop is closed by fusing the duplciated map points and features in the current frame with the looped frame. This creates and edge in the graph. The graph is then optimized using Pose Graph Optimization. I used the ceres solver to optimize the graph. This process results in drift being corrected as shown in the before and after images below.