In the original implementation of Consistent Depth Maps Recovery from a Video Sequence (Zhang et al., 2009), the algorithm picked 30-40 neighbors for each image in every bundle optimization iteration. We already know this can lighten the flickering effect, which is basically inconsistent depths. The inconsistency comes from errors during pixel-wise disparity estimation, errors in camera matrices and so on.
The small inconsistent patches can be easily rectified in only 1 iteration. However, more iterations are needed if:
- The error exists in many continuous frames
- The patches are very large
- Both 1 and 2 , leading to time-consuming computation. I proposed two ways to select neighbors when we need to handle initialization errors.
The fundamental matrix
Before that, I’d like to recommend this catchy song as it goes through almost every detail of the fundamental matrix in the book (Hartley & Zisserman, 2003).
Problems
Motion blur
While working on the assignment, I soon realized that summing up neighboring disparity maps is a trade-off between the denoising quality and the level of motion blur.


Erroneous patches

The inconsistency comes from errors during pixel-wise disparity estimation, errors in camera matrices, and so on. It will later affect the quality of the initialization process.
Solutions
Extra bundle optimization steps
Bundle optimization is designed to mitigate errors using neighbors, usually picking 30-50 images in total, before and after the frame to be optimized. However, including more neighbors and iterations can lead to greater cost in computaional time. I proposed two tweaks to discover referencing frames in a more efficient and reliable way.
Random neighbor selection
By randomly picking neighbors, it can fix more pixels in one iteration comparing to using the adjacent neighbors.

Random neighbor selection has several drawbacks:
- It is based on random selection, so the stability is not guaranteed.
- We hope the number of neighbors on left and right are balanced.
- We still need to select as many neighbors as before, making the computation time-consuming.
- We might still select “bad” images as our neighbors!
Reliable neighbor voting
My goal is to construct an initialized image quality indicator, so that we can know which image is more “correct” and “reliable”, which means this can be set as a good reference for others; and which image is more “wrong”, which means it should be fixed.
Here comes the idea of reliable neighbor voting:
When the initialization process finish, using the initialized disparity and the camera matrices, we can match each pair of images and see how similar they are. By projecting one image to another, we can find the corresponding coordinates. Of course, there will be inliers and some outliers that don’t belong to the other image. We just focus on the disparity values in the covered part.

By computing the sum of the squared Euclidian distance of the corresponding disparity pixels in the two images, we can get the inconsistency value . This is what I called “votes”.
Note that was slightly different from , due to the way I formulate it. Either can show us the “correctness” of frames in a chart.
