Tencent
Graph Correspondence Transfer for Person Re-Identification
Zhou, Qin (Institute of Image Processing and Network Engineering, Shanghai Jiao Tong University) | Fan, Heng (Temple University) | Zheng, Shibao (Shanghai Jiao Tong University) | Su, Hang (Tsinghua University) | Li, Xinzhe (Shanghai Jiao Tong University) | Wu, Shuang (Tencent) | Ling, Haibin (Temple University)
In this paper, we propose a graph correspondence transfer (GCT) approach for person re-identification. Unlike existing methods, the GCT model formulates person re-identification as an off-line graph matching and on-line correspondence transferring problem. In specific, during training, the GCT model aims to learn off-line a set of correspondence templates from positive training pairs with various pose-pair configurations via patch-wise graph matching. During testing, for each pair of test samples, we select a few training pairs with the most similar pose-pair configurations as references, and transfer the correspondences of these references to test pair for feature distance calculation. The matching score is derived by aggregating distances from different references. For each probe image, the gallery image with the highest matching score is the re-identifying result. Compared to existing algorithms, our GCT can handle spatial misalignment caused by large variations in view angles and human poses owing to the benefits of patch-wise graph matching. Extensive experiments on five benchmarks including VIPeR, Road, PRID450S, 3DPES and CUHK01 evidence the superior performance of GCT model over other state-of-the-art methods.
DF 2 Net: Discriminative Feature Learning and Fusion Network for RGB-D Indoor Scene Classification
Li, Yabei (Institute of Automation, Chinese Academy of Sciences (CASIA)) | Zhang, Junge (Institute of Automation, Chinese Academy of Sciences (CASIA)) | Cheng, Yanhua (Tencent) | Huang, Kaiqi (Institute of Automation, Chinese Academy of Sciences (CASIA)) | Tan, Tieniu (Institute of Automation, Chinese Academy of Sciences (CASIA))
This paper focuses on the task of RGB-D indoor scene classification. It is a very challenging task due to two folds. 1) Learning robust representation for indoor scene is difficult because of various objects and layouts. 2) Fusing the complementary cues in RGB and Depth is nontrivial since there are large semantic gaps between the two modalities. Most existing works learn representation for classification by training a deep network with softmax loss and fuse the two modalities by simply concatenating the features of them. However, these pipelines do not explicitly consider intra-class and inter-class similarity as well as inter-modal intrinsic relationships. To address these problems, this paper proposes a Discriminative Feature Learning and Fusion Network (DF 2 Net) with two-stage training. In the first stage, to better represent scene in each modality, a deep multi-task network is constructed to simultaneously minimize the structured loss and the softmax loss. In the second stage, we design a novel discriminative fusion network which is able to learn correlative features of multiple modalities and distinctive features of each modality. Extensive analysis and experiments on SUN RGB-D Dataset and NYU Depth Dataset V2 show the superiority of DF 2 Net over other state-of-the-art methods in RGB-D indoor scene classification task.
Conversational Model Adaptation via KL Divergence Regularization
Li, Juncen (Tencent) | Luo, Ping (Institute of Computing Technology, CAS, Beijing) | Lin, Fen (University of Chinese Academy of Sciences, Beijing) | Chen, Bo (Tencent)
In this study we formulate the problem of conversational model adaptation, where we aim to build a generative conversational model for a target domain based on a limited amount of dialogue data from this target domain and some existing dialogue models from related source domains. This model facilitates the fast building of a chatbot platform, where a new vertical chatbot with only a small number of conversation data can be supported by other related mature chatbots. Previous studies on model adaptation and transfer learning mostly focus on classification and recommendation problems, however, how these models work for conversation generation are still unexplored. To this end, we leverage a KL divergence (KLD) regularization to adapt the existing conversational models. Specifically, it employs the KLD to measure the distance between source and target domain. Adding KLD as a regularization to the objective function allows the proposed method to utilize the information from source domains effectively. We also evaluate the performance of this adaptation model for the online chatbots in Wechat platform of public accounts using both the BLEU metric and human judgement. The experiments empirically show that the proposed method visibly improves these evaluation metrics.
Mechanism-Aware Neural Machine for Dialogue Response Generation
Zhou, Ganbin (Institute of Computing Technology, Chinese Academy of Sciences) | Luo, Ping (Institute of Computing Technology, Chinese Academy of Sciences) | Cao, Rongyu (Institute of Computing Technology, Chinese Academy of Sciences) | Lin, Fen (Tencent) | Chen, Bo (Tencent) | He, Qing (Institute of Computing Technology, Chinese Academy of Sciences)
To the same utterance, people's responses in everyday dialogue may be diverse largely in terms of content semantics, speaking styles, communication intentions and so on. Previous generative conversational models ignore these 1-to-n relationships between a post to its diverse responses, and tend to return high-frequency but meaningless responses. In this study we propose a mechanism-aware neural machine for dialogue response generation. It assumes that there exists some latent responding mechanisms, each of which can generate different responses for a single input post. With this assumption we model different responding mechanisms as latent embeddings, and develop a encoder-diverter-decoder framework to train its modules in an end-to-end fashion. With the learned latent mechanisms, for the first time these decomposed modules can be used to encode the input into mechanism-aware context, and decode the responses with the controlled generation styles and topics. Finally, the experiments with human judgements, intuitive examples, detailed discussions demonstrate the quality and diversity of the generated responses with 9.80% increase of acceptable ratio over the best of six baseline methods.
Pairwise Relationship Guided Deep Hashing for Cross-Modal Retrieval
Yang, Erkun (Xidian University) | Deng, Cheng (Xidian University) | Liu, Wei (Tencent) | Liu, Xianglong (Beihang University) | Tao, Dacheng (University of Technology Sydney) | Gao, Xinbo (Xidian University)
With benefits of low storage cost and fast query speed, cross-modal hashing has received considerable attention recently. However, almost all existing methods on cross-modal hashing cannot obtain powerful hash codes due to directly utilizing hand-crafted features or ignoring heterogeneous correlations across different modalities, which will greatly degrade the retrieval performance. In this paper, we propose a novel deep cross-modal hashing method to generate compact hash codes through an end-to-end deep learning architecture, which can effectively capture the intrinsic relationships between various modalities. Our architecture integrates different types of pairwise constraints to encourage the similarities of the hash codes from an intra-modal view and an inter-modal view, respectively. Moreover, additional decorrelation constraints are introduced to this architecture, thus enhancing the discriminative ability of each hash bit. Extensive experiments show that our proposed method yields state-of-the-art results on two cross-modal retrieval datasets.
User Modeling with Neural Network for Review Rating Prediction
Tang, Duyu (Harbin Institute of Techonology) | Qin, Bing (Harbin Institute of Technology) | Liu, Ting (Harbin Institute of Technology) | Yang, Yuekui (Tencent)
We present a neural network method for review rating prediction in this paper. Existing neural network methods for sentiment prediction typically only capture the semantics of texts, but ignore the user who expresses the sentiment.This is not desirable for review rating prediction as each user has an influence on how to interpret the textual content of a review.For example, the same word (e.g. good) might indicate different sentiment strengths when written by different users. We address this issue by developing a new neural network that takes user information into account. The intuition is to factor in user-specific modification to the meaning of a certain word.Specifically, we extend the lexical semantic composition models and introduce a user-word composition vector model (UWCVM), which effectively captures how user acts as a function affecting the continuous word representation. We integrate UWCVM into a supervised learning framework for review rating prediction, andconduct experiments on two benchmark review datasets.Experimental results demonstrate the effectiveness of our method. It shows superior performances over several strong baseline methods.