Yang, Zhuoqian
3DHumanGAN: 3D-Aware Human Image Generation with 3D Pose Mapping
Yang, Zhuoqian, Li, Shikai, Wu, Wayne, Dai, Bo
Human image generation is a long-standing topic in computer We present 3DHumanGAN, a 3D-aware generative adversarial vision and graphics with applications across multiple network that synthesizes photo-like images of fullbody areas of interest including movie production, social networking humans with consistent appearances under different and e-commerce. Compared to physically-based methods, view-angles and body-poses. To tackle the representational data-driven approaches are preferred due to the photolikeness and computational challenges in synthesizing the articulated of their results, versatility and ease of use [60]. In structure of human bodies, we propose a novel generator this work, we are interested in synthesizing full-body human architecture in which a 2D convolutional backbone is modulated images with a 3D-aware generative adversarial network by a 3D pose mapping network. The 3D pose mapping (GAN) that produces appearance-consistent images under network is formulated as a renderable implicit function conditioned different view-angles and body-poses.
Multi-modal Learning with Prior Visual Relation Reasoning
Yang, Zhuoqian, Yu, Jing, Yang, Chenghao, Qin, Zengchang, Hu, Yue
Visual relation reasoning is a central component in recent cross-modal analysis tasks, which aims at reasoning about the visual relationships between objects and their properties. These relationships convey rich semantics and help to enhance the visual representation for improving cross-modal analysis. Previous works have succeeded in designing strategies for modeling latent relations or rigid-categorized relations and achieving the lift of performance. However, this kind of methods leave out the ambiguity inherent in the relations because of the diverse relational semantics of different visual appearances. In this work, we explore to model relations by contextual-sensitive embeddings based on human priori knowledge. We novelly propose a plug-and-play relation reasoning module injected with the relation embeddings to enhance image encoder. Specifically, we design upgraded Graph Convolutional Networks (GCN) to utilize the information of relation embeddings and relation directionality between objects for generating relation-aware image representations. We demonstrate the effectiveness of the relation reasoning module by applying it to both Visual Question Answering (VQA) and Cross-Modal Information Retrieval (CMIR) tasks. Extensive experiments are conducted on VQA 2.0 and CMPlaces datasets and superior performance is reported when comparing with state-of-the-art work.