Institute of Automation, Chinese Academy of Sciences (CASIA)
DF 2 Net: Discriminative Feature Learning and Fusion Network for RGB-D Indoor Scene Classification
Li, Yabei (Institute of Automation, Chinese Academy of Sciences (CASIA)) | Zhang, Junge (Institute of Automation, Chinese Academy of Sciences (CASIA)) | Cheng, Yanhua (Tencent) | Huang, Kaiqi (Institute of Automation, Chinese Academy of Sciences (CASIA)) | Tan, Tieniu (Institute of Automation, Chinese Academy of Sciences (CASIA))
This paper focuses on the task of RGB-D indoor scene classification. It is a very challenging task due to two folds. 1) Learning robust representation for indoor scene is difficult because of various objects and layouts. 2) Fusing the complementary cues in RGB and Depth is nontrivial since there are large semantic gaps between the two modalities. Most existing works learn representation for classification by training a deep network with softmax loss and fuse the two modalities by simply concatenating the features of them. However, these pipelines do not explicitly consider intra-class and inter-class similarity as well as inter-modal intrinsic relationships. To address these problems, this paper proposes a Discriminative Feature Learning and Fusion Network (DF 2 Net) with two-stage training. In the first stage, to better represent scene in each modality, a deep multi-task network is constructed to simultaneously minimize the structured loss and the softmax loss. In the second stage, we design a novel discriminative fusion network which is able to learn correlative features of multiple modalities and distinctive features of each modality. Extensive analysis and experiments on SUN RGB-D Dataset and NYU Depth Dataset V2 show the superiority of DF 2 Net over other state-of-the-art methods in RGB-D indoor scene classification task.
Unsupervised Part-Based Weighting Aggregation of Deep Convolutional Features for Image Retrieval
Xu, Jian (Institute of Automation, Chinese Academy of Sciences (CASIA)) | Shi, Cunzhao (University of Chinese Academy of Sciences) | Qi, Chengzuo (Institute of Automation, Chinese Academy of Sciences (CASIA)) | Wang, Chunheng (Institute of Automation, Chinese Academy of Sciences (CASIA)) | Xiao, Baihua (University of Chinese Academy of Sciences)
In this paper, we propose a simple but effective semantic part-based weighting aggregation (PWA) for image retrieval. The proposed PWA utilizes the discriminative filters of deep convolutional layers as part detectors. Moreover, we propose the effective unsupervised strategy to select some part detectors to generate the "probabilistic proposals," which highlight certain discriminative parts of objects and suppress the noise of background. The final global PWA representation could then be acquired by aggregating the regional representations weighted by the selected "probabilistic proposals" corresponding to various semantic content. We conduct comprehensive experiments on four standard datasets and show that our unsupervised PWA outperforms the state-of-the-art unsupervised and supervised aggregation methods.
Unsupervised Learning of Multi-Level Descriptors for Person Re-Identification
Yang, Yang (Institute of Automation, Chinese Academy of Sciences (CASIA)) | Wen, Longyin (State University of New York at Albany) | Lyu, Siwei (State University of New York at Albany) | Li, Stan Z. (Institute of Automation, Chinese Academy of Sciences (CASIA))
In this paper, we propose a novel coding method named weighted linear coding (WLC) to learn multi-level (e.g., pixel-level, patch-level and image-level) descriptors from raw pixel data in an unsupervised manner. It guarantees the property of saliency with a similarity constraint. The resulting multi-level descriptors have a good balance between the robustness and distinctiveness. Based on WLC, all data from the same region can be jointly encoded. Consequently, when we extract the holistic image features, it is able to preserve the spatial consistency. Furthermore, we apply PCA to these features and compact person representations are then achieved. During the stage of matching persons, we exploit the complementary information resided in multi-level descriptors via a score-level fusion strategy. Experiments on the challenging person re-identification datasets - VIPeR and CUHK 01, demonstrate the effectiveness of our method.