CS 180 Project 4

Shoot the Pictures

To capture sets of photos that are projective transformations of each other, I need to fix the center of projection for each set. Therefore, I hold my phone such that the camera is on the rotation axis, and make sure each image in a set has about 40% to 70% overlap with the center image.

univhall_1.jpg

univhall_2.jpg
Center Image

univhall_3.jpg

entrance_1.jpg

entrance_2.jpg
Center Image

entrance_3.jpg

night_1.jpg

night_2.jpg
Center Image

night_3.jpg

Recover Homographies

For each pair of images, I need to recover a projective transformation below, where H is the homography matrix with 8 degrees of freedom.

After manually entering the correspondence points for each pair of images using ginput, I used np.linalg.pinv to calculate the least-square solution for H in the following system, where each (x, y) and (x', y') is a pair of correspondence points in the source and the target image.

Each coordinate (x, y) in the source image can be transformed to the target coordinate (x', y') by multiplying H and then dividing w, which acts as a scaling factor.

Warp the Images

After calculating the homography for each pair of images, I warped the source image to the target image using the following warping algorithm:

1. Get the coordinates of the 4 corners of the source image and apply the homography on them. Use skimage.draw.polygon on the result coordinates to get all pixels in the transformed polygon that will be used in inverse warping, as well as perdicting the size of the warped image.
2. For each pixel in this polygon, apply the inverse of the homography on its coordinate to get the sampling coordinate in the source image, and then interpolate this pixel from the source image using scipy.interpolate.griddata.
3. For pixels without any values in the warped image, set the values of the alpha channel to 0. This helps with image blending.

Image Rectification

With the above algorithms, I can use projective transformations to warp images between each other by defining correspondences. For image rectification, I manually marked the four corners of the known rectangular object in the input image, and then defines the correspondences using a function that calculates the corner coordinates of a rectangle with similar size to the input corners.

computer.png
+ Correspondences

Rectified

roof.jpg
+ Correspondences

Rectified

Using the four pairs of correspondences, I recovered the homography between the input image and the calculated rectangle, and then used the warping algorithm described above to warp the input image such that the known rectangular object is now a rectangle in the output image.

Blend the Images into A Mosaic

To stitch a set of three photos into a mosaic, I defined the base photo as the "center" photo in the set, and marked the correspondence points between the base photo and the other two photos. In this case, I marked 9 points for each images, as this overdetermined system is relatively more stable and less prone to noise compare to only 4 points. Then, I warped other photos into the base photo using the algorithms described above. However, notice that when combining the warped photos and creating the mosaic, simply overwriting photos with each other will lead to visible edge artifacts.

Naive Blending with Artifacts

To remove these artifacts, I first calculated a distance transform mask for each warped photo, where each pixel in the mask shows its Euclidean distance to the closest boundary of the warped image.

univhall_1.jpg
Distance Transform

univhall_2.jpg
Distance Transform

univhall_3.jpg
Distance Transform

After getting the distance transform masks, I computed a 2-level Laplacian pyramid on each photo to produce a low-pass filtered blurred photo and a high-pass filtered photo with only high frequency details, and then blend the low-pass photos and high-pass photos using different methods.

The overlapping region of each pair of low-pass photos are blended using a weighted linear combination, where the weights are determined by the distance transform masks, such that pixels closer toward the center of each photo have higher weights.

The overlapping region of each pair of high-pass photos are blended using a binary mask calculated from the distance transform masks, such that for each pixel in the overlapping region, its value is taken from the high-pass photo with higher distance transform value.

Using this blending method, the result has smoother transition and much less artifacts

univhall_1.jpg

univhall_2.jpg
+ Correspondences

univhall_3.jpg

univhall Mosiac

entrance_1.jpg

entrance_2.jpg
+ Correspondences

entrance_3.jpg

entrance Mosiac

night_1.jpg

night_2.jpg
+ Correspondences

night_3.jpg

night Mosiac

Harris Interest Point Detector

To automatically find interest points that can be matched into correspondences, I used the Harris interest point detector to locate all "corners": image patches around an interest point whose sum of squared differences with their immediate neighbors, or their corner strengths, are above a certain threshold.

univhall_2.jpg

univhall_2.jpg
Harris Corners

As we can see on the above images with all Harris corners overlaid on it, this detector produces extremely dense set of Harris corners without any suppression.

Adaptive Non-Maximal Suppression

Since too much interest points causes redundant computations, I used the Adaptive Non-Maximal Suppression (ANMS) algorithm that reduces the number of interest points while distribute them evenly throughout the image. In ANMS, only interest points that have the local maximum corner strength in a neighbourhood of radius r pixels are retained. The algorithm starts with the maximum r, which will find the interest point with global maximum corner strength, and gradually decrease r, adding more local maximum interest point into the list, until the number of interest points in the list reached the target amount, which in my case is the top-500 points for each image.

univhall_2.jpg
Without Suppression

univhall_2.jpg
ANMS with n_ip = 500 and c_robust = 0.9

Feature Descriptor Extraction

After ANMS returned the interest points, the next step is to match each interest point with its correspondence across the image. To do this, I extracted feature descriptors by sampling a 40x40 patch around each interest point, and downsample the large patches into 8x8 descriptors using a Gaussian low-pass filter. Since the descriptors are blurred by the low-pass filter, downsampling helps with avoiding aliasing.

Each 8x8 descriptor is also bias/gain-normalized such that for each descriptor, the mean of the pixel intensity is 0 and the standard deviation is 1. This normalization makes descriptors invariant to affine changes in intensity values.

A Descriptor of univhall_2.jpg
+ Low-Pass Filter
+ Bias/Gain-Normalization

The Original 40x40 Patch

A Descriptor of night_2.jpg
+ Low-Pass Filter
+ Bias/Gain-Normalization

The Original 40x40 Patch

Feature Matching

To match feature descriptors between two images, I first computed the pairwise L2 distances between all descriptors in one image and all descriptors in the other image. For each descriptor, I then find its nearest neighbor in the distance matrix, which is its potential matching descriptor in the other image.

To increase the certainty for each match, I used Lowe's Trick that thresholds on the ratio between the distance to the nearest neighbor and the second nearest neighbor. The ratio will only be below the threshold if the nearest neighbor is a much better match than the second nearest neighbor, as well as all other further matches, thus is more likely to be the true match.

entrance_2.jpg_entrance_1.jpg_corresponence.png

entrance_1.jpg & entrance_2.jpg
Matched Correspondences with threshold = 0.8

While most of the matched correspondences looked correct, some visible outliers were present in the result images.

RANSAC for Robust Homography Estimation

As seen in the results of the last part, there are still some incorrect matches between correspondences. Since least-square estimations are sensitive to outliers, I used a Random Sample Consensus (RANSAC) algorithm to increase the robustness of homography estimation. For each loop in RANSAC, the algorithm randomly picks 4 pairs of correspondences from the result in the last part, and compute the exact homography using these 8 points. Then, the algorithm applies the homography to all points and count the number of inliers: points whose distance to their matched correspondences after warping is less than 2 pixels. The algorithm keeps track of the largest set of inliers throughtout the loops, and after enough iterations (5000 in this case), uses this largest set of inliers to compute the least-square homography. Using this RANSAC algorithm, I was able to filter out the correspondences that does not follow the majority homography, thus increasing the robustness of the homography estimation against outliers.

entrance_2.jpg_entrance_1.jpg_ransac.png

entrance_1.jpg & entrance_2.jpg
After RANSAC with 5000 Iterations

Results using Autostitching

I applied all steps above on the same images from part A, and the results are shown below, manually and automatically stitched results side by side.

univhall Manually Stitched Mosaic

univhall Autostitched Mosaic

entrance Manually Stitched Mosaic

entrance Autostitched Mosaic

night Manually Stitched Mosaic

night Autostitched Mosaic

By showing both manually and automatically stitched results next to each other, it is easy to see that autostitching produces clearer results and reduces visual artifacts such as ghosting, since it is not affected by noises and error caused by manual inputs.

Left: Manual, Right: Autostitched

The coolest thing I learned from this project is how the computer "understands" an image completely different from human, yet we can use techniques like Harris corner detection and feature matching to achieve the same, and even better results. The moment that clever techniques like ANMS and RANSAC filtered out all redundent data and keeped only the best correspondences is also very satisfying.

(Auto)stitching and Photo Mosaics