Collaborating Authors

Learning Adversarial 3D Model Generation With 2D Image Enhancer

AAAI Conferences

Recent advancements in generative adversarial nets (GANs) and volumetric convolutional neural networks (CNNs) enable generating 3D models from a probabilistic space. In this paper, we have developed a novel GAN-based deep neural network to obtain a better latent space for the generation of 3D models. In the proposed method, an enhancer neural network is introduced to extract information from other corresponding domains (e.g. image) to improve the performance of the 3D model generator, and the discriminative power of the unsupervised shape features learned from the 3D model discriminator. Specifically, we train the 3D generative adversarial networks on 3D volumetric models, and at the same time, the enhancer network learns image features from rendered images. Different from the traditional GAN architecture that uses uninformative random vectors as inputs, we feed the high-level image features learned from the enhancer into the 3D model generator for better training. The evaluations on two large-scale 3D model datasets, ShapeNet and ModelNet, demonstrate that our proposed method can not only generate high-quality 3D models, but also successfully learn discriminative shape representation for classification and retrieval without supervision.

Dialog-based Interactive Image Retrieval

Neural Information Processing Systems

Existing methods for interactive image retrieval have demonstrated the merit of integrating user feedback, improving retrieval results. However, most current systems rely on restricted forms of user feedback, such as binary relevance responses, or feedback based on a fixed set of relative attributes, which limits their impact. In this paper, we introduce a new approach to interactive image search that enables users to provide feedback via natural language, allowing for more natural and effective interaction. We formulate the task of dialog-based interactive image retrieval as a reinforcement learning problem, and reward the dialog system for improving the rank of the target image during each dialog turn. To mitigate the cumbersome and costly process of collecting human-machine conversations as the dialog system learns, we train our system with a user simulator, which is itself trained to describe the differences between target and candidate images.

Improving Microblog Retrieval from Exterior Corpus by Automatically Constructing Microblogging Corpus

AAAI Conferences

A large-scale training corpus consisting of microblogs belonging to a desired category is important for high-accuracy microblog retrieval. Obtaining such a large-scale microblgging corpus manually is very time and labor-consuming. Therefore, some models for the automatic retrieval of microblogs froman exterior corpus have been proposed. However, these approaches may fail in considering microblog-specific features. To alleviate this issue, we propose a methodology that constructs a simulated microblogging corpus rather than directly building a model from the exterior corpus. The performance of our model is better since the microblog-special knowledge of the microblogging corpus is used in the end by the retrieval model. Experimental results on real-world microblogs demonstrate the superiority of our technique compared to the previous approaches.

Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries

Neural Information Processing Systems

This paper explores the task of interactive image retrieval using natural language queries, where a user progressively provides input queries to refine a set of retrieval results. Moreover, our work explores this problem in the context of complex image scenes containing multiple objects. We propose Drill-down, an effective framework for encoding multiple queries with an efficient compact state representation that significantly extends current methods for single-round image retrieval. We show that using multiple rounds of natural language queries as input can be surprisingly effective to find arbitrarily specific images of complex scenes. Furthermore, we find that existing image datasets with textual captions can provide a surprisingly effective form of weak supervision for this task.