KARPE/DIEM/

Github Repository

https://github.com/kedarkarpe/Faster-RCNN

Introduction

In this project we will implement some of the components of MaskRCNN, an algorithm that addresses the task of instance segmentation, which combines object detection and semantic segmentation into a per-pixel object detection framework.

The full implementation of Faster-RCNN would take many days to train, so we will implement a simpler version of the Region Proposal Network that keeps all the necessary components. However, these simplifications affect the performance of the algorithm. In the second parts of the project, we will use pretrained parts to boost the performance.

Object Detection - State of the Art.

Model Architecture

The architecture for the RPN and the later refinement of the proposals is shown below: Model Architecture.

Part 1 - Region Proposal Networks

Region Proposal Networks (RPNs) are ”attention mechanisms” for the object detection task, performing a crude but inexpensive first estimation of where the bounding boxes of the objects should be. They were first proposed as a way to address the issue of expensive greedy algorithms like Selective Search, opening new avenues to end-to-end object detection tasks. They work through classifying the initial anchor boxes into object/background and refine the coordinates for the boxes with objects. Later, these boxes will be further refined and tightened by the instance segmentation heads as well as classified in their corresponding classes.

Architecture of the Region Proposal Network.

1. Ground Truth Creation

Ground Truth Proposal Regions. Top 20 Ground Truth proposals.

2. Training and Loss Plots

Fire test of the inkjet cartridge driver. First power-on test of the robot.

3. Predicted Outputs

Pre-NMS Predicted Proposals. Post-NMS Predicted Proposals.

Part 2 - Object Detection

Classifier Head

The classifier head in Faster-RCNN is responsible for refining the proposals generated by the Region Proposal Network (RPN) and classifying them into specific object categories. It consists of two main components: a bounding box regressor and a classifier.

1. Bounding Box Regressor

The bounding box regressor refines the coordinates of the proposals generated by the RPN. It takes the feature maps corresponding to each proposal and predicts the offsets that need to be applied to the proposal coordinates to better fit the object.

2. Classifier

The classifier assigns a class label to each proposal. It takes the feature maps corresponding to each proposal and predicts the probability distribution over all possible object classes, including the background class.

Training

The classifier head is trained using a multi-task loss that combines the classification loss and the bounding box regression loss. The classification loss is typically a cross-entropy loss, while the bounding box regression loss is a smooth L1 loss.

Loss Function

The total loss for the classifier head is given by: \(L = L_{cls} + \lambda L_{reg}\) where \(L_{cls}\) is the classification loss, \(L_{reg}\) is the bounding box regression loss, and \(\lambda\) is a weighting factor that balances the two losses.

References

Original YOLO paper: https://arxiv.org/pdf/1506.02640.pdf
Intuitive Explanation: https://towardsdatascience.com/yolo-you-only-look-once-real-time-object-detection-explained-492dc9230006
YOLO Video Tutorial: https://www.youtube.com/watch?v=9s_FpMpdYW8&list=PLkDaE6sCZn6Gl29AoE31iwdVwSG-KnDzF&index=30
mean Average Precision: https://medium.com/@jonathan_hui/map-mean-average-precision-for-object-detection-45c121a31173
Intersection over Union: https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection

Faster-RCNN: Towards Real-Time Object Detection with Region Proposal Networks