State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position. RPNs are trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection.
Accurately counting objects instances in a given image or video frame is a hard problem to solve in machine learning. A number of solutions have been developed to count people, cars and other objects and none of them is perfect. Of course, we are talking about image processing here, so a neural network seems to be a good tool for the job. Below you can find a description of different approaches, common problems, challenges and latest solutions in the Neural Networks object counting field. As a proof of concept, existing model for Faster R-CNN network will be used to count objects on the street with video examples given at the end of the post.
Throughout this article, I will discuss some of the more complex aspects of convolutional neural networks and how they related to specific tasks such as object detection and facial recognition. This article is a natural extension to my article titled: Simple Introductions to Neural Networks. I recommend looking at this before tackling the rest of this article if you are not well-versed in the idea and function of convolutional neural networks. Due to the excessive length of the original article, I have decided to leave out several topics related to object detection and facial recognition systems, as well as some of the more esoteric network architectures and practices currently being trialed in the research literature. I will likely discuss these in a future article related more specifically to the application of deep learning for computer vision.
Computer vision is an interdisciplinary field that has been gaining huge amounts of traction in the recent years(since CNN) and self-driving cars have taken centre stage. Another integral part of computer vision is object detection. Object detection aids in pose estimation, vehicle detection, surveillance etc. The difference between object detection algorithms and classification algorithms is that in detection algorithms, we try to draw a bounding box around the object of interest to locate it within the image. Also, you might not necessarily draw just one bounding box in an object detection case, there could be many bounding boxes representing different objects of interest within the image and you would not know how many beforehand.
With the rise of self-driving vehicles comes the risk of accidents and the need for higher safety, and protection for pedestrian detection in the following scenarios: imminent crashes, thus the car should crash into an object and avoid the pedestrian, and in the case of road intersections, where it is important for the car to stop when pedestrians are crossing. Currently, a special topology of deep neural networks called Fused Deep Neural Network (F-DNN) is considered to be the state of the art in pedestrian detection, as it has the lowest miss rate, yet it is very slow. Therefore, acceleration is needed to speed up the performance. This project proposes two contributions to address this problem, by using a deep neural network used for object detection, called Single Shot Multi-Box Detector (SSD). The first contribution is training and tuning the hyperparameters of SSD to improve pedestrian detection. The second contribution is a new FPGA design for accelerating the model on the Altera Arria 10 platform. The final system will be used in self-driving vehicles in real-time. Preliminary results of the improved SSD shows 3% higher miss-rate than F-DNN on Caltech pedestrian detection benchmark, but 4x performance improvement. The acceleration design is expected to achieve an additional performance improvement significantly outweighing the minimal difference in accuracy.