Table Understanding: Object Detection

R-CNN

Input the image
Region proposal: Selective Search, etc.
warp each region proposal to a 227*227 image then input to CNN, fetch the output of 7 fully connected layers as features
input features into SVM for classification

不是简单的回归预测

RPN:

(1) 给个一个输入图像，首先将图像划分成7*7的网格 (2) 对于每个网格，我们都预测2个边框（包括每个边框是目标的置信度以及每个边框区域在多个类别上的概率） (3) 根据上一步可以预测出7*7*2个目标窗口，然后根据阈值去除可能性比较低的目标窗口，最后NMS去除冗余窗口即可