Object detection
# Code
# References
YOLO. You Only Look Once: Unified, Real-Time Object Detection (Redmon 2016)
- YOLO or You Only Look Once is an object detection algorithm much different from the region based algorithms. In YOLO a single convolutional network predicts the bounding boxes and the class probabilities for these boxes.
- How YOLO works is that we take an image and split it into an SxS grid, within each of the grid we take m bounding boxes. For each of the bounding box, the network outputs a class probability and offset values for the bounding box. The bounding boxes having the class probability above a threshold value is selected and used to locate the object within the image.
EfficientDet: Scalable and Efficient Object Detection (Tan 2020)
Region-based CNNs (R-CNNs):
Regional CNN (R-CNN)
- The goal of R-CNN is to take in an image, and correctly identify where the main objects (via a bounding box) in the image.
- R-CNN creates these bounding boxes, or region proposals, using a process called Selective Search.
- Once the proposals are created, R-CNN warps the region to a standard square size and passes it through to a modified version of AlexNet (the winning submission to ImageNet 2012 that inspired R-CNN).
- On the final layer of the CNN, R-CNN adds a Support Vector Machine (SVM) that simply classifies whether this is an object, and if so what object.
Fast R-CNN
- RoI (Region of Interest) Pooling. At its core, RoIPool shares the forward pass of a CNN for an image across its subregions.
- The second insight of Fast R-CNN is to jointly train the CNN, classifier, and bounding box regressor in a single model.
Faster R-CNN
- The insight of Faster R-CNN was that region proposals depended on features of the image that were already calculated with the forward pass of the CNN (first step of classification).
- So why not reuse those same CNN results for region proposals instead of running a separate selective search algorithm?
- A single CNN is used to both carry out region proposals and classification. This way, only one CNN needs to be trained and we get region proposals almost for free. Faster R-CNN adds a Fully Convolutional Network on top of the features of the CNN creating what’s known as the Region Proposal Network.
Mask R-CNN (He 2018)
- Extending Faster R-CNN for Pixel Level Segmentation
- Mask R-CNN does this by adding a branch to Faster R-CNN that outputs a binary mask that says whether or not a given pixel is part of an object. The branch, as before, is just a Fully Convolutional Network on top of a CNN based feature map.
- But the Mask R-CNN authors had to make one small adjustment to make this pipeline work as expected: Realigning RoIPool to be More Accurate.
