Goto

Collaborating Authors

 He, Xiangnan


Group-Pair Convolutional Neural Networks for Multi-View Based 3D Object Retrieval

AAAI Conferences

In recent years, research interest in object retrieval has shifted from 2D towards 3D data. Despite many well-designed approaches, we point out that limitations still exist and there is tremendous room for improvement, including the heavy reliance on hand-crafted features, the separated optimization of feature extraction and object retrieval, and the lack of sufficient training samples. In this work, we address the above limitations for 3D object retrieval by developing a novel end-to-end solution named Group Pair Convolutional Neural Network (GPCNN). It can jointly learn the visual features from multiple views of a 3D model and optimize towards the object retrieval task. To tackle the insufficient training data issue, we innovatively employ a pair-wise learning scheme, which learns model parameters from the similarity of each sample pair, rather than the traditional way of learning from sparse label–sample matching. Extensive experiments on three public benchmarks show that our GPCNN solution significantly outperforms the state-of-the-art methods with 3% to 42% improvement in retrieval accuracy.


VELDA: Relating an Image Tweet’s Text and Images

AAAI Conferences

Image tweets are becoming a prevalent form of socialmedia, but little is known about their content — textualand visual — and the relationship between the two mediums.Our analysis of image tweets shows that while visualelements certainly play a large role in image-text relationships, other factors such as emotional elements, also factor into the relationship. We develop Visual-Emotional LDA (VELDA), a novel topic model to capturethe image-text correlation from multiple perspectives (namely, visual and emotional). Experiments on real-world image tweets in both Englishand Chinese and other user generated content, show that VELDA significantly outperforms existingmethods on cross-modality image retrieval. Even in other domains where emotion does not factor in imagechoice directly, our VELDA model demonstrates good generalization ability, achieving higher fidelity modeling of such multimedia documents.