The idea of the camera course is to build a collision detection system - that’s the overall goal for the Final Project. As a preparation for this, you will now build the feature tracking part and test various detector / descriptor combinations to see which ones perform best. This mid-term project consists of four parts:
See the classroom instruction and code comments for more details on each of these parts. Once you are finished with this project, the keypoint matching part will be set up and you can proceed to the next lesson, where the focus is on integrating Lidar points and on object detection using deep-learning.
This very README.md contains all commentary on all items of the rubric for this project and as such functions as the mid-term report.
I use an STL vector as dataBuffer objects and made sure its size does not exceed a fixed limit. This is achieved that by removing the oldest image as any new image is added.
I implement the HARRIS, FAST, BRISK, ORB, AKAZE, and SIFT detectors using OpenCV functionality. They are conveniently selectable in the source by simply setting a string accordingly.
I discard all keypoints outside of a rectangular region of interest (ROI) corresponding to the area of the image where the vehicle directly in front of the ego car is positioned. Only the keypoints within that ROI are used in the next tasks of this project.
I implement the BRIEF, ORB, FREAK, AKAZE and SIFT descriptors using OpenCV functionality. They are conveniently selectable in the source code by simply setting a string accordingly.
I implement FLANN matching as well as k-Nearest-Neighbor selection. Both methods are selectable using the respective strings in the main function.
I use the K-Nearest-Neighbor matching to implement the descriptor distance ratio test, which looks at the ratio of best vs. second-best match to decide whether to keep an associated pair of keypoints.
The following spreadsheet shows the total number of detected keypoints on the preceding vehicle, on all ten images of the dataset, along with a note on the distribution of their neighborhood size, for all combinations of detectors (rows) and descriptors (columns) implemented:
OpenCV library was compiled without CUDA support. Used matcher MAT_FLANN, match descriptor DES_BINARY, and selector SEL_NN for all runs.
BRISK | BRIEF | ORB | FREAK | AKASE | SIFT | |
---|---|---|---|---|---|---|
SHITOMASI | 13423 | 13423 | 13423 | 13423 | N/A | 13423 |
HARRIS | 1737 | 1737 | 1737 | 1737 | N/A | 1737 |
FAST | 17874 | 17874 | 17874 | 17874 | N/A | 17874 |
BRISK | 27116 | 27116 | 27116 | 27116 | N/A | 27116 |
ORB | 5000 | 5000 | 5000 | 5000 | N/A | 5000 |
AKAZE | 13429 | 13429 | 13429 | 13429 | 13429 | 13429 |
SIFT | 13860 | 13860 | N/A | 13860 | N/A | 13860 |
The following spreadsheet shows the total number of matched keypoints on the preceding vehicle, on all ten images of the dataset, for all combinations of detectors (rows) and descriptors (columns) implemented:
BRISK | BRIEF | ORB | FREAK | AKASE | SIFT | |
---|---|---|---|---|---|---|
SHITOMASI | 1067 | 1067 | 1067 | 1067 | N/A | 1067 |
HARRIS | 214 | 214 | 214 | 214 | N/A | 214 |
FAST | 1348 | 1348 | 1348 | 1348 | N/A | 1348 |
BRISK | 2508 | 2508 | 2508 | 2326 | N/A | 2508 |
ORB | 950 | 1033 | 1033 | 549 | N/A | 1033 |
AKAZE | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 |
SIFT | 1249 | 1250 | N/A | 1240 | N/A | 1250 |
The following spreadsheet shows the keypoints detection total time, on all ten images of the dataset, for all combinations of detectors (rows) and descriptors (columns) implemented:
BRISK | BRIEF | ORB | FREAK | AKASE | SIFT | |
---|---|---|---|---|---|---|
SHITOMASI | 94.2 | 104 | 97.7 | 81.0 | N/A | 99.7 |
HARRIS | 95.5 | 98.1 | 109.9 | 92.4 | N/A | 107.6 |
FAST | 7.9 | 7.9 | 7.7 | 7.6 | N/A | 8.0 |
BRISK | 300.2 | 302 | 298.4 | 300.3 | N/A | 300.4 |
ORB | 146.5 | 149.1 | 154.4 | 146.5 | N/A | 159.3 |
AKAZE | 356.4 | 400.3 | 380.2 | 347.2 | 436.9 | 403.3 |
SIFT | 520.5 | 546.5 | N/A | 538.7 | N/A | 413.1 |
The following spreadsheet shows the descriptor extraction total time, on all ten images of the dataset, for all combinations of detectors (rows) and descriptors (columns) implemented:
BRISK | BRIEF | ORB | FREAK | AKASE | SIFT | |
---|---|---|---|---|---|---|
SHITOMASI | 12.4 | 7.5 | 6.6 | 197.4 | N/A | 96.4 |
HARRIS | 4.2 | 1.9 | 6.5 | 196.4 | N/A | 88.61 |
FAST | 14.3 | 7.7 | 8.4 | 205.8 | N/A | 101.8 |
BRISK | 23.1 | 5.7 | 34.3 | 205.0 | N/A | 138.0 |
ORB | 11.1 | 3.4 | 35.2 | 201.6 | N/A | 169.10 |
AKAZE | 13.8 | 5.8 | 24.2 | 216.7 | 345.4 | 112.4 |
SIFT | 11.6 | 6.4 | N/A | 203.1 | N/A | 370.5 |
Based on the data above, the top three detector/descriptor combinations recommended as the best choices for the purpose of detecting keypoints on vehicles are:
I also reduced the ROI window and tried a few more runs with the different matchers and selectors, but overall the above recommendations seemed to still hold. If given more time I would compile OpenCV with CUDA support but I have already spent a large amount of time on this project as is. I believe this work covers all points of the rubric appropriately.