In the original implementation of Consistent Depth Maps Recovery from a Video Sequence (Zhang et al., 2009), the algorithm picked 30-40 neighbors for each image in every bundle optimization iteration. We already know this can lighten the flickering effect, which is basically inconsistent depths. The inconsistency comes from errors during pixel-wise disparity estimation, errors in camera matrices and so on.

The small inconsistent patches can be easily rectified in only 1 iteration. However, more iterations are needed if:

The error exists in many continuous frames
The patches are very large
Both 1 and 2 , leading to time-consuming computation. I proposed two ways to select neighbors when we need to handle initialization errors.

The fundamental matrix

Before that, I’d like to recommend this catchy song as it goes through almost every detail of the fundamental matrix in the book (Hartley & Zisserman, 2003).

Problems

Motion blur

While working on the assignment, I soon realized that summing up neighboring disparity maps is a trade-off between the denoising quality and the level of motion blur.

Figure 1: More neighbors, more motion blur

Erroneous patches

Figure 2: How a good and a bad disparity map look (note the part in the sky)

The inconsistency comes from errors during pixel-wise disparity estimation, errors in camera matrices, and so on. It will later affect the quality of the initialization process.

Solutions

Extra bundle optimization steps

Bundle optimization is designed to mitigate errors using neighbors, usually picking 30-50 images in total, before and after the frame to be optimized. However, including more neighbors and iterations can lead to greater cost in computaional time. I proposed two tweaks to discover referencing frames in a more efficient and reliable way.

Random neighbor selection

By randomly picking neighbors, it can fix more pixels in one iteration comparing to using the adjacent neighbors.

Figure 3: Most errors are cancelled in 1 run using random neighbors

Random neighbor selection has several drawbacks:

It is based on random selection, so the stability is not guaranteed.
We hope the number of neighbors on left and right are balanced.
We still need to select as many neighbors as before, making the computation time-consuming.
We might still select “bad” images as our neighbors!

Reliable neighbor voting

My goal is to construct an initialized image quality indicator, so that we can know which image is more “correct” and “reliable”, which means this can be set as a good reference for others; and which image is more “wrong”, which means it should be fixed.

Here comes the idea of reliable neighbor voting:

When the initialization process finish, using the initialized disparity and the camera matrices, we can match each pair of images and see how similar they are. By projecting one image to another, we can find the corresponding coordinates. Of course, there will be inliers and some outliers that don’t belong to the other image. We just focus on the disparity values in the covered part.

Figure 4: Inlier and outlier of $i$ -th image (L) given the $j$ -th (R)

By computing the sum of the squared Euclidian distance of the corresponding disparity pixels in the two images, we can get the inconsistency value $I_{ij}$ . This is what I called “votes”.

inconsistency votes sent from i = j \in N_{i} \sum I_{ij}

inconsistency votes received by i = i \in N_{j} \sum I_{ji}

Note that $I_{ij}$ was slightly different from $I_{ji}$ , due to the way I formulate it. Either can show us the “correctness” of frames in a chart.

Figure 5: We can tell good frame apart by thresholding

References

Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision. Cambridge university press.

Zhang, G., Jia, J., Wong, T.-T., & Bao, H. (2009). Consistent depth maps recovery from a video sequence. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(6), 974–988.

🐿️ Draftz

Explorer

Towards Robust Depth Estimation