R-FCN: Object Detection via Region-based Fully Convolutional Networks

jifeng dai, Yi Li, Kaiming He, Jian Sun

Neural Information Processing Systems 

VOC datasets (e.g., 83.6% mAP on the 2007 set) with the 101-layer ResNet. Meanwhile, our result is achieved at a test-time speed of 170ms per image, 2.5-20