Goto

Collaborating Authors

 object proposal




A Basic Functions

Neural Information Processing Systems

Each question in PTR is associated with a functional program built from a set of basic functions. A.1 Data Types Our basic functional building blocks operate on values of the following types: Object: A single object in the scene. Part-level functions are listed in Table 4. B have certain spatial relationships. For NS-VQA, we first use Mask-RCNN to propose segmentations for objects and parts. If an object is unstable, possible changes (to_left, to_right, to_front, to_behind) are predicted.


3D Object Proposals for Accurate Object Class Detection

Neural Information Processing Systems

The goal of this paper is to generate high-quality 3D object proposals in the context of autonomous driving. Our method exploits stereo imagery to place proposals in the form of 3D bounding boxes. We formulate the problem as minimizing an energy function encoding object size priors, ground plane as well as several depth informed features that reason about free space, point cloud densities and distance to the ground. Our experiments show significant performance gains over existing RGB and RGB-D object proposal methods on the challenging KITTI benchmark. Combined with convolutional neural net (CNN) scoring, our approach outperforms all existing results on all three KITTI object classes.


3D Object Proposals for Accurate Object Class Detection Xiaozhi Chen 1 Andrew Berneshawi

Neural Information Processing Systems

The goal of this paper is to generate high-quality 3D object proposals in the context of autonomous driving. Our method exploits stereo imagery to place proposals in the form of 3D bounding boxes. We formulate the problem as minimizing an energy function encoding object size priors, ground plane as well as several depth informed features that reason about free space, point cloud densities and distance to the ground. Our experiments show significant performance gains over existing RGB and RGB-D object proposal methods on the challenging KITTI benchmark. Combined with convolutional neural net (CNN) scoring, our approach outperforms all existing results on all three KITTI object classes.


Moving Object Proposals with Deep Learned Optical Flow for Video Object Segmentation

arXiv.org Artificial Intelligence

Dynamic scene understanding is one of the most conspicuous field of interest among computer vision community. In order to enhance dynamic scene understanding, pixel-wise segmentation with neural networks is widely accepted. The latest researches on pixel-wise segmentation combined semantic and motion information and produced good performance. In this work, we propose a state of art architecture of neural networks to accurately and efficiently get the moving object proposals (MOP). We first train an unsupervised convolutional neural network (UnFlow) to generate optical flow estimation. Then we render the output of optical flow net to a fully convolutional SegNet model. The main contribution of our work is (1) Fine-tuning the pretrained optical flow model on the brand new DAVIS Dataset; (2) Leveraging fully convolutional neural networks with Encoder-Decoder architecture to segment objects. We developed the codes with TensorFlow, and executed the training and evaluation processes on an AWS EC2 instance.


Open-vocabulary Queryable Scene Representations for Real World Planning

arXiv.org Artificial Intelligence

Abstract-- Large language models (LLMs) have unlocked new capabilities of task planning from human instructions. NLMap first establishes a natural language queryable scene representation with Visual Language models (VLMs). An LLM based object proposal module parses instructions and proposes involved objects to query the scene representation for object availability and location. An LLM planner then plans with such information about the scene. We propose an open-vocabulary and queryable scene representation for real-world planning. The returned object presence and location are used for LLM-based planning. It has to first identify relevant objects and upon it. Recent progress in large language models (LLMs), locations within the scene (e.g., the watering can, the sink, and has shown impressive few-shot performance in language each potential plant) and then plan over these objects in sequential comprehension, semantic understanding, and reasoning, as order (get the watering can, then go the sink, and then fill it well as application to robotics problems like planning [5]-[7] up), conditioning on its affordances (e.g., can it carry a full and instruction following [8]. Using such models in embodied watering can), and conditioning on the scene (e.g., how many settings can provide significant challenges, most critically because plants there are, and where are they).


3D Object Proposals for Accurate Object Class Detection

Neural Information Processing Systems

The goal of this paper is to generate high-quality 3D object proposals in the context of autonomous driving. Our method exploits stereo imagery to place proposals in the form of 3D bounding boxes. We formulate the problem as minimizing an energy function encoding object size priors, ground plane as well as several depth informed features that reason about free space, point cloud densities and distance to the ground. Our experiments show significant performance gains over existing RGB and RGB-D object proposal methods on the challenging KITTI benchmark. Combined with convolutional neural net (CNN) scoring, our approach outperforms all existing results on all three KITTI object classes.


Efficient Object Instance Search Using Fuzzy Objects Matching

AAAI Conferences

Recently, global features aggregated from local convolutional features of the convolutional neural network have shown to be much more effective in comparison with hand-crafted features for image retrieval. However, the global feature might not effectively capture the relevance between the query object and reference images in the object instance search task, especially when the query object is relatively small and there exist multiple types of objects in reference images. Moreover, the object instance search requires to localize the object in the reference image, which may not be achieved through global representations. In this paper, we propose a Fuzzy Objects Matching (FOM) framework to effectively and efficiently capture the relevance between the query object and reference images in the dataset. In the proposed FOM scheme, object proposals are utilized to detect the potential regions of the query object in reference images. To achieve high search efficiency, we factorize the feature matrix of all the object proposals from one reference image into the product of a set of fuzzy objects and sparse codes. In addition, we refine the feature of the generated fuzzy objects according to its neighborhood in the feature space to generate more robust representation. The experimental results demonstrate that the proposed FOM framework significantly outperforms the state-of-the-art methods in precision with less memory and computational cost on three public datasets.


3D Object Proposals for Accurate Object Class Detection

Neural Information Processing Systems

The goal of this paper is to generate high-quality 3D object proposals in the context of autonomous driving. Our method exploits stereo imagery to place proposals in the form of 3D bounding boxes. We formulate the problem as minimizing an energy function encoding object size priors, ground plane as well as several depth informed features that reason about free space, point cloud densities and distance to the ground. Our experiments show significant performance gains over existing RGB and RGB-D object proposal methods on the challenging KITTI benchmark. Combined with convolutional neural net (CNN) scoring, our approach outperforms all existing results on all three KITTI object classes.