Goto

Collaborating Authors

 fast r-cnn




Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Neural Information Processing Systems

State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position. RPNs are trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection.


Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Neural Information Processing Systems

State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet [7] and Fast R-CNN [5] have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position. RPNs are trained end-to-end to generate highquality region proposals, which are used by Fast R-CNN for detection. With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolutional features. For the very deep VGG-16 model [19], our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007 (73.2% mAP) and 2012 (70.4% mAP) using 300 proposals per image.


Tree-Structured Reinforcement Learning for Sequential Object Localization

Neural Information Processing Systems

Existing object proposal algorithms usually search for possible object regions over multiple locations and scales separately, which ignore the interdependency among different objects and deviate from the human perception procedure. To incorporate global interdependency between objects into object localization, we propose an effective Tree-structured Reinforcement Learning (Tree-RL) approach to sequentially search for objects by fully exploiting both the current observation and historical search paths. The Tree-RL approach learns multiple searching policies through maximizing the long-term reward that reflects localization accuracies over all the objects. Starting with taking the entire image as a proposal, the Tree-RL approach allows the agent to sequentially discover multiple objects via a tree-structured traversing scheme. Allowing multiple near-optimal policies, Tree-RL offers more diversity in search paths and is able to find multiple objects with a single feedforward pass. Therefore, Tree-RL can better cover different objects with various scales which is quite appealing in the context of object proposal. Experiments on PASCAL VOC 2007 and 2012 validate the effectiveness of the Tree-RL, which can achieve comparable recalls with current object proposal algorithms via much fewer candidate windows.


Unsupervised domain adaptation by learning using privileged information

Breitholtz, Adam, Matsson, Anton, Johansson, Fredrik D.

arXiv.org Artificial Intelligence

Successful unsupervised domain adaptation (UDA) is guaranteed only under strong assumptions such as covariate shift and overlap between input domains. The latter is often violated in high-dimensional applications such as image classification which, despite this challenge, continues to serve as inspiration and benchmark for algorithm development. In this work, we show that access to side information about examples from the source and target domains can help relax these assumptions and increase sample efficiency in learning, at the cost of collecting a richer variable set. We call this domain adaptation by learning using privileged information (DALUPI). Tailored for this task, we propose a simple two-stage learning algorithm inspired by our analysis and a practical end-to-end algorithm for multi-label image classification. In a suite of experiments, including an application to medical image analysis, we demonstrate that incorporating privileged information in learning can reduce errors in domain transfer compared to classical learning.


Use Cases and Roll-Out Tips for Image Recognition in Retail

#artificialintelligence

Heavily shattered by the pandemic, the retail sector is on the lookout for innovation. Among the many technologies retailers focus on, artificial intelligence is an undeniable leader. The market of artificial intelligence solutions for retail is projected to reach $23.32 billion by 2027, quite a leap compared to $5.06 billion in 2021. Within AI, computer vision and image recognition have become notable areas of interest for the retail sector -- the global market of retail image recognition software is expected to grow at a CAGR of 22% and attain the value of $3.7 billion by 2025. Bringing image recognition into their technology mixes, retailers hope to optimize inventories, simplify checkouts, and boost customer experience.


Here's how deep learning helps computers detect objects

#artificialintelligence

Deep neural networks have gained fame for their capability to process visual information. And in the past few years, they have become a key component of many computer vision applications. Among the key problems neural networks can solve is detecting and localizing objects in images. Object detection is used in many different domains, including autonomous driving, video surveillance, and healthcare. In this post, I will briefly review the deep learning architectures that help computers detect objects.


An introduction to object detection with deep learning

#artificialintelligence

This article is part of "Deconstructing artificial intelligence," a series of posts that explore the details of how AI applications work (In partnership with Paperspace). Deep neural networks have gained fame for their capability to process visual information. And in the past few years, they have become a key component of many computer vision applications. Among the key problems neural networks can solve is detecting and localizing objects in images. Object detection is used in many different domains, including autonomous driving, video surveillance, and healthcare.


Object sorting using faster R-CNN

Chen, Pengchang, Elangovan, Vinayak

arXiv.org Artificial Intelligence

In a factory production line, different industry parts need to be quickly differentiated and sorted for further process. Parts can be of different colors and shapes. It is tedious for humans to differentiate and sort these objects in appropriate categories. Automating this process would save more time and cost. In the automation process, choosing an appropriate model to detect and classify different objects based on specific features is more challenging. In this paper, three different neural network models are compared to the object sorting system. They are namely CNN, Fast R-CNN, and Faster R-CNN. These models are tested, and their performance is analyzed. Moreover, for the object sorting system, an Arduino-controlled 5 DoF (degree of freedom) robot arm is programmed to grab and drop symmetrical objects to the targeted zone. Objects are categorized into classes based on color, defective and non-defective objects.