Introduction
The first part of this project goes through the basics of stitching image mosaics, including computing homographies through correspondence points, using the homographies to warp images such that the correspondence points overlap with each other, and blending warped images using Laplacian pyramid.
Part A: Image Warping and Mosaicing
Shoot the Pictures
To capture sets of photos that are projective transformations of each other, I need to fix the center of projection for each set. Therefore, I hold my phone such that the camera is on the rotation axis, and make sure each image in a set has about 40% to 70% overlap with the center image.
univhall_1.jpg
univhall_2.jpg
Center Image
univhall_3.jpg
entrance_1.jpg
entrance_2.jpg
Center Image
entrance_3.jpg
night_1.jpg
night_2.jpg
Center Image
night_3.jpg
Recover Homographies
For each pair of images, I need to recover a projective transformation below, where H
is the homography matrix with 8 degrees of freedom.
After manually entering the correspondence points for each pair of images using ginput
, I used np.linalg.pinv
to
calculate the least-square solution for H
in the following system, where each (x, y)
and (x', y')
is a pair of
correspondence points in the source and the target image.
Each coordinate (x, y)
in the source image can be transformed to the target coordinate (x', y')
by multiplying H
and then dividing w
, which acts as a scaling factor.
Warp the Images
After calculating the homography for each pair of images, I warped the source image to the target image using the following warping algorithm:
1. Get the coordinates of the 4 corners of the source image and apply the homography on them. Use skimage.draw.polygon
on the result coordinates
to get all pixels in the transformed polygon that will be used in inverse warping, as well as perdicting the size of the warped image.
2. For each pixel in this polygon, apply the inverse of the homography on its coordinate to get the sampling coordinate in the source image, and then interpolate
this pixel from the source image using scipy.interpolate.griddata
.
3. For pixels without any values in the warped image, set the values of the alpha channel to 0. This helps with image blending.
Image Rectification
With the above algorithms, I can use projective transformations to warp images between each other by defining correspondences. For image rectification, I manually marked the four corners of the known rectangular object in the input image, and then defines the correspondences using a function that calculates the corner coordinates of a rectangle with similar size to the input corners.
computer.png
+ Correspondences
Rectified
roof.jpg
+ Correspondences
Rectified
Using the four pairs of correspondences, I recovered the homography between the input image and the calculated rectangle, and then used the warping algorithm described above to warp the input image such that the known rectangular object is now a rectangle in the output image.
Blend the Images into A Mosaic
To stitch a set of three photos into a mosaic, I defined the base photo as the "center" photo in the set, and marked the correspondence points between the base photo and the other two photos. In this case, I marked 9 points for each images, as this overdetermined system is relatively more stable and less prone to noise compare to only 4 points. Then, I warped other photos into the base photo using the algorithms described above. However, notice that when combining the warped photos and creating the mosaic, simply overwriting photos with each other will lead to visible edge artifacts.
Naive Blending with Artifacts
To remove these artifacts, I first calculated a distance transform mask for each warped photo, where each pixel in the mask shows its Euclidean distance to the closest boundary of the warped image.
univhall_1.jpg
Distance Transform
univhall_2.jpg
Distance Transform
univhall_3.jpg
Distance Transform
After getting the distance transform masks, I computed a 2-level Laplacian pyramid on each photo to produce a low-pass filtered blurred photo and a high-pass filtered photo with only high frequency details, and then blend the low-pass photos and high-pass photos using different methods.
The overlapping region of each pair of low-pass photos are blended using a weighted linear combination, where the weights are determined by the distance transform masks, such that pixels closer toward the center of each photo have higher weights.
The overlapping region of each pair of high-pass photos are blended using a binary mask calculated from the distance transform masks, such that for each pixel in the overlapping region, its value is taken from the high-pass photo with higher distance transform value.
Using this blending method, the result has smoother transition and much less artifacts
univhall_1.jpg
univhall_2.jpg
+ Correspondences
univhall_3.jpg
univhall Mosiac
entrance_1.jpg
entrance_2.jpg
+ Correspondences
entrance_3.jpg
entrance Mosiac
night_1.jpg
night_2.jpg
+ Correspondences
night_3.jpg
night Mosiac
Part B: Feature Matching for Autostitching
For part B, I followed this paper to implement a system for automatically stitching images into a mosaic.
Harris Interest Point Detector
To automatically find interest points that can be matched into correspondences, I used the Harris interest point detector to locate all "corners": image patches around an interest point whose sum of squared differences with their immediate neighbors, or their corner strengths, are above a certain threshold.
univhall_2.jpg
univhall_2.jpg
Harris Corners
As we can see on the above images with all Harris corners overlaid on it, this detector produces extremely dense set of Harris corners without any suppression.
Adaptive Non-Maximal Suppression
Since too much interest points causes redundant computations, I used the Adaptive Non-Maximal Suppression (ANMS) algorithm that
reduces the number of interest points while distribute them evenly throughout the image. In ANMS, only interest points that have
the local maximum corner strength in a neighbourhood of radius r
pixels are retained. The algorithm starts with
the maximum r
, which will find the interest point with global maximum corner strength, and gradually decrease
r
, adding more local maximum interest point into the list, until the number of interest points in the
list reached the target amount, which in my case is the top-500 points for each image.
univhall_2.jpg
Without Suppression
univhall_2.jpg
ANMS with n_ip = 500
and c_robust = 0.9
Feature Descriptor Extraction
After ANMS returned the interest points, the next step is to match each interest point with its correspondence across the image. To do this, I extracted feature descriptors by sampling a 40x40 patch around each interest point, and downsample the large patches into 8x8 descriptors using a Gaussian low-pass filter. Since the descriptors are blurred by the low-pass filter, downsampling helps with avoiding aliasing.
Each 8x8 descriptor is also bias/gain-normalized such that for each descriptor, the mean of the pixel intensity is 0 and the standard deviation is 1. This normalization makes descriptors invariant to affine changes in intensity values.
A Descriptor of univhall_2.jpg
+ Low-Pass Filter
+ Bias/Gain-Normalization
The Original 40x40 Patch
A Descriptor of night_2.jpg
+ Low-Pass Filter
+ Bias/Gain-Normalization
The Original 40x40 Patch
Feature Matching
To match feature descriptors between two images, I first computed the pairwise L2 distances between all descriptors in one image and all descriptors in the other image. For each descriptor, I then find its nearest neighbor in the distance matrix, which is its potential matching descriptor in the other image.
To increase the certainty for each match, I used Lowe's Trick that thresholds on the ratio between the distance to the nearest neighbor and the second nearest neighbor. The ratio will only be below the threshold if the nearest neighbor is a much better match than the second nearest neighbor, as well as all other further matches, thus is more likely to be the true match.
entrance_1.jpg & entrance_2.jpg
Matched Correspondences with threshold = 0.8
While most of the matched correspondences looked correct, some visible outliers were present in the result images.
RANSAC for Robust Homography Estimation
As seen in the results of the last part, there are still some incorrect matches between correspondences. Since least-square estimations are sensitive to outliers, I used a Random Sample Consensus (RANSAC) algorithm to increase the robustness of homography estimation. For each loop in RANSAC, the algorithm randomly picks 4 pairs of correspondences from the result in the last part, and compute the exact homography using these 8 points. Then, the algorithm applies the homography to all points and count the number of inliers: points whose distance to their matched correspondences after warping is less than 2 pixels. The algorithm keeps track of the largest set of inliers throughtout the loops, and after enough iterations (5000 in this case), uses this largest set of inliers to compute the least-square homography. Using this RANSAC algorithm, I was able to filter out the correspondences that does not follow the majority homography, thus increasing the robustness of the homography estimation against outliers.
entrance_1.jpg & entrance_2.jpg
After RANSAC with 5000 Iterations
Results using Autostitching
I applied all steps above on the same images from part A, and the results are shown below, manually and automatically stitched results side by side.
univhall Manually Stitched Mosaic
univhall Autostitched Mosaic
entrance Manually Stitched Mosaic
entrance Autostitched Mosaic
night Manually Stitched Mosaic
night Autostitched Mosaic
By showing both manually and automatically stitched results next to each other, it is easy to see that autostitching produces clearer results and reduces visual artifacts such as ghosting, since it is not affected by noises and error caused by manual inputs.
Left: Manual, Right: Autostitched
The coolest thing I learned from this project is how the computer "understands" an image completely different from human, yet we can use techniques like Harris corner detection and feature matching to achieve the same, and even better results. The moment that clever techniques like ANMS and RANSAC filtered out all redundent data and keeped only the best correspondences is also very satisfying.