Goto

Collaborating Authors

 sun rgb-d dataset








3D Copy-Paste: Physically Plausible Object Insertion for Monocular 3D Detection

arXiv.org Artificial Intelligence

A major challenge in monocular 3D object detection is the limited diversity and quantity of objects in real datasets. While augmenting real scenes with virtual objects holds promise to improve both the diversity and quantity of the objects, it remains elusive due to the lack of an effective 3D object insertion method in complex real captured scenes. In this work, we study augmenting complex real indoor scenes with virtual objects for monocular 3D object detection. The main challenge is to automatically identify plausible physical properties for virtual assets (e.g., locations, appearances, sizes, etc.) in cluttered real scenes. To address this challenge, we propose a physically plausible indoor 3D object insertion approach to automatically copy virtual objects and paste them into real scenes. The resulting objects in scenes have 3D bounding boxes with plausible physical locations and appearances. In particular, our method first identifies physically feasible locations and poses for the inserted objects to prevent collisions with the existing room layout. Subsequently, it estimates spatially-varying illumination for the insertion location, enabling the immersive blending of the virtual objects into the original scene with plausible appearances and cast shadows. We show that our augmentation method significantly improves existing monocular 3D object models and achieves state-of-the-art performance. For the first time, we demonstrate that a physically plausible 3D object insertion, serving as a generative data augmentation technique, can lead to significant improvements for discriminative downstream tasks such as monocular 3D object detection. Project website: https://gyhandy.github.io/3D-Copy-Paste/


Centroid Based Concept Learning for RGB-D Indoor Scene Classification

arXiv.org Artificial Intelligence

Classifying images taken from indoor scenes is an important area of research. The development of an accurate indoor scene classifier has the potential to improve indoor localization and decision-making for domestic robots, offer new applications for wearable computer users, and generally result in better vision-based situation awareness thus impacting a wide variety of applications. The introduction of deep learning methods, the creation of numerous large-scale datasets, and the development of specialized computing hardware have all contributed to the rapid improvement in image classification performance. One reason for deep learning's success has been the ability to learn multiple layers of generic image features that can then be used on other related computer vision problems. For instance, features from object trained image classifiers have been used to train indoor scene classifiers [27]. Yet, indoor scene classification is a challenging problem on its own.


DF 2 Net: Discriminative Feature Learning and Fusion Network for RGB-D Indoor Scene Classification

AAAI Conferences

This paper focuses on the task of RGB-D indoor scene classification. It is a very challenging task due to two folds. 1) Learning robust representation for indoor scene is difficult because of various objects and layouts. 2) Fusing the complementary cues in RGB and Depth is nontrivial since there are large semantic gaps between the two modalities. Most existing works learn representation for classification by training a deep network with softmax loss and fuse the two modalities by simply concatenating the features of them. However, these pipelines do not explicitly consider intra-class and inter-class similarity as well as inter-modal intrinsic relationships. To address these problems, this paper proposes a Discriminative Feature Learning and Fusion Network (DF 2 Net) with two-stage training. In the first stage, to better represent scene in each modality, a deep multi-task network is constructed to simultaneously minimize the structured loss and the softmax loss. In the second stage, we design a novel discriminative fusion network which is able to learn correlative features of multiple modalities and distinctive features of each modality. Extensive analysis and experiments on SUN RGB-D Dataset and NYU Depth Dataset V2 show the superiority of DF 2 Net over other state-of-the-art methods in RGB-D indoor scene classification task.