For the predecessor paper on SIFT (Lowe, 2004), check . This is a review of Automatic Panoramic Image Stitching using Invariant Features(Brown & Lowe, 2007). The Assignment 1 of EE5731 Visual Computing is based on these two papers.

Homography

Input: pairs of anchors in both images

Output: the homography matrix

This paper usesĀ homographyĀ as the fundamental transformation to warp and stitch images together to create a panorama. Think of homography as a mathematical function that maps points from one image plane to corresponding points on another image plane, as if you were viewing a flat scene from different angles. It’s more powerful than simple translation, rotation, and scaling because it accounts for perspective distortions.

The paper emphasizes that homography works best when the scene is mostly flat or when the camera is rotating around its center of projection (no parallax). Because real scenes often aren’t perfect, the paper prioritizes stitching based on homographies between neighboring images in the panorama sequence. This is of great necessity, especially when we have multiple images and some of them are not neighbors. We’ll discuss this in Automatic Image Stitching.

In this example, if we correctly select at least 4 pairs of keypoints, we can recover the angle of the camera.

RANSAC

Input: all keypoints in both images

Output: the best pairs of keypoints to be the anchors

RANSAC stands for Robustly Finding the Best Match. Here’s how it’s used:

RANSAC algorithm

Feature matching

First, SIFT features are detected in each image and matched between pairs of images.

Random sampling

RANSAC randomly selects a minimal set of matches (enough to compute a potential homography).

Homography calculation

Based on this small set of matches, a homography transformation is calculated.

Consensus set

The homography is then used to transform all other points in one image to the other. The algorithm then checks how many of the other matches also ā€œagreeā€ with this homography (i.e., the transformed point is close to its corresponding point in the other image). These agreeing matches form the ā€œconsensus set.ā€

Iteration

This process (random sampling, homography calculation, consensus set building) is repeated many times. The homography that yields the largest consensus set is selected as the best transformation between the images.

In essence, RANSAC is a robust method for filtering out bad matches (ā€œoutliersā€) that would otherwise throw off the homography estimation. By relying on a majority consensus of good matches, it provides a much more accurate and reliable result.

We are now able to handle basic image stitching tasks with anchors automatically extracted by RANSAC.

References

Brown, M., & Lowe, D. G. (2007). Automatic panoramic image stitching using invariant features. International Journal of Computer Vision, 74, 59–73.
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91–110.