Goto

Collaborating Authors

University of Science and Technology of China


Sequence-to-Sequence Learning via Shared Latent Representation

AAAI Conferences

Sequence-to-sequence learning is a popular research area in deep learning, such as video captioning and speech recognition. Existing methods model this learning as a mapping process by first encoding the input sequence to a fixed-sized vector, followed by decoding the target sequence from the vector. Although simple and intuitive, such mapping model is task-specific, unable to be directly used for different tasks. In this paper, we propose a star-like framework for general and flexible sequence-to-sequence learning, where different types of media contents (the peripheral nodes) could be encoded to and decoded from a shared latent representation (SLR) (the central node). This is inspired by the fact that human brain could learn and express an abstract concept in different ways. The media-invariant property of SLR could be seen as a high-level regularization on the intermediate vector, enforcing it to not only capture the latent representation intra each individual media like the auto-encoders, but also their transitions like the mapping models. Moreover, the SLR model is content-specific, which means it only needs to be trained once for a dataset, while used for different tasks. We show how to train a SLR model via dropout and use it for different sequence-to-sequence tasks. Our SLR model is validated on the Youtube2Text and MSR-VTT datasets, achieving superior performance on video-to-sentence task, and the first sentence-to-video results.


Lin

AAAI Conferences

Restoring face images from distortions is important in face recognition applications and is challenged by multiple scale issues, which is still not well-solved in research area. In this paper, we present a Sequential Gating Ensemble Network (SGEN) for multi-scale face restoration issue. We first employ the principle of ensemble learning into SGEN architecture design to reinforce predictive performance of the network. The SGEN aggregates multi-level base-encoders and base-decoders into the network, which enables the network to contain multiple scales of receptive field. Instead of combining these base-en/decoders directly with non-sequential operations, the SGEN takes base-en/decoders from different levels as sequential data.


Wu

AAAI Conferences

We propose the first privacy-preserving approach to address the privacy issues that arise in multi-agent planning problems modeled as a Dec-POMDP. Our solution is a distributed message-passing algorithm based on trials, where the agents' policies are optimized using the cross-entropy method. In our algorithm, the agents' private information is protected using a public-key homomorphic cryptosystem. We prove the correctness of our algorithm and analyze its complexity in terms of message passing and encryption/decryption operations. Furthermore, we analyze several privacy aspects of our algorithm and show that it can preserve the agent privacy of non-neighbors, model privacy, and decision privacy. Our experimental results on several common Dec-POMDP benchmark problems confirm the effectiveness of our approach.


Lateral Inhibition-Inspired Convolutional Neural Network for Visual Attention and Saliency Detection

AAAI Conferences

Lateral inhibition in top-down feedback is widely existing in visual neurobiology, but such an important mechanism has not be well explored yet in computer vision. In our recent research, we find that modeling lateral inhibition in convolutional neural network (LICNN) is very useful for visual attention and saliency detection. In this paper, we propose to formulate lateral inhibition inspired by the related studies from neurobiology, and embed it into the top-down gradient computation of a general CNN for classification, i.e. only category-level information is used. After this operation (only conducted once), the network has the ability to generate accurate category-specific attention maps. Further, we apply LICNN for weakly-supervised salient object detection.Extensive experimental studies on a set of databases, e.g., ECSSD, HKU-IS, PASCAL-S and DUT-OMRON, demonstrate the great advantage of LICNN which achieves the state-of-the-art performance. It is especially impressive that LICNN with only category-level supervised information even outperforms some recent methods with segmentation-level supervised learning.


Confidence-Aware Matrix Factorization for Recommender Systems

AAAI Conferences

Collaborative filtering (CF), particularly matrix factorization (MF) based methods, have been widely used in recommender systems. The literature has reported that matrix factorization methods often produce superior accuracy of rating prediction in recommender systems. However, existing matrix factorization methods rarely consider confidence of the rating prediction and thus cannot support advanced recommendation tasks. In this paper, we propose a Confidence-aware Matrix Factorization (CMF) framework to simultaneously optimize the accuracy of rating prediction and measure the prediction confidence in the model. Specifically, we introduce variance parameters for both users and items in the matrix factorization process. Then, prediction interval can be computed to measure confidence for each predicted rating. These confidence quantities can be used to enhance the quality of recommendation results based on Confidence-aware Ranking (CR). We also develop two effective implementations of our framework to compute the confidence-aware matrix factorization for large-scale data. Finally, extensive experiments on three real-world datasets demonstrate the effectiveness of our framework from multiple perspectives.


Dual Transfer Learning for Neural Machine Translation with Marginal Distribution Regularization

AAAI Conferences

Neural machine translation (NMT) heavily relies on parallel bilingual data for training. Since large-scale, high-quality parallel corpora are usually costly to collect, it is appealing to exploit monolingual corpora to improve NMT. Inspired by the law of total probability, which connects the probability of a given target-side monolingual sentence to the conditional probability of translating from a source sentence to the target one, we propose to explicitly exploit this connection to learn from and regularize the training of NMT models using monolingual data. The key technical challenge of this approach is that there are exponentially many source sentences for a target monolingual sentence while computing the sum of the conditional probability given each possible source sentence. We address this challenge by leveraging the dual translation model (target-to-source translation) to sample several mostly likely source-side sentences and avoid enumerating all possible candidate source sentences. That is, we transfer the knowledge contained in the dual model to boost the training of the primal model (source-to-target translation), and we call such an approach dual transfer learning. Experiment results on English-French and German-English tasks demonstrate that dual transfer learning achieves significant improvement over several strong baselines and obtains new state-of-the-art results.


Privacy-Preserving Policy Iteration for Decentralized POMDPs

AAAI Conferences

We propose the first privacy-preserving approach to address the privacy issues that arise in multi-agent planning problems modeled as a Dec-POMDP. Our solution is a distributed message-passing algorithm based on trials, where the agents' policies are optimized using the cross-entropy method. In our algorithm, the agents' private information is protected using a public-key homomorphic cryptosystem. We prove the correctness of our algorithm and analyze its complexity in terms of message passing and encryption/decryption operations. Furthermore, we analyze several privacy aspects of our algorithm and show that it can preserve the agent privacy of non-neighbors, model privacy, and decision privacy. Our experimental results on several common Dec-POMDP benchmark problems confirm the effectiveness of our approach.


Joint Training for Neural Machine Translation Models with Monolingual Data

AAAI Conferences

Monolingual data have been demonstrated to be helpful in improving translation quality of both statistical machine translation (SMT) systems and neural machine translation (NMT) systems, especially in resource-poor or domain adaptation tasks where parallel data are not rich enough. In this paper, we propose a novel approach to better leveraging monolingual data for neural machine translation by jointly learning source-to-target and target-to-source NMT models for a language pair with a joint EM optimization method. The training process starts with two initial NMT models pre-trained on parallel data for each direction, and these two models are iteratively updated by incrementally decreasing translation losses on training data.In each iteration step, both NMT models are first used to translate monolingual data from one language to the other, forming pseudo-training data of the other NMT model. Then two new NMT models are learnt from parallel data together with the pseudo training data. Both NMT models are expected to be improved and better pseudo-training data can be generated in next step. Experiment results on Chinese-English and English-German translation tasks show that our approach can simultaneously improve translation quality of source-to-target and target-to-source models, significantly outperforming strong baseline systems which are enhanced with monolingual data for model training including back-translation.


How Images Inspire Poems: Generating Classical Chinese Poetry from Images with Memory Networks

AAAI Conferences

With the recent advances of neural models and natural language processing, automatic generation of classical Chinese poetry has drawn significant attention due to its artistic and cultural value. Previous works mainly focus on generating poetry given keywords or other text information, while visual inspirations for poetry have been rarely explored. Generating poetry from images is much more challenging than generating poetry from text, since images contain very rich visual information which cannot be described completely using several keywords, and a good poem should convey the image accurately. In this paper, we propose a memory based neural model which exploits images to generate poems. Specifically, an Encoder-Decoder model with a topic memory network is proposed to generate classical Chinese poetry from images. To the best of our knowledge, this is the first work attempting to generate classical Chinese poetry from images with neural networks. A comprehensive experimental investigation with both human evaluation and quantitative analysis demonstrates that the proposed model can generate poems which convey images accurately.


Guo

AAAI Conferences

Continuous Sign Language Translation (SLT) is a challenging task due to its specific linguistics under sequential gesture variation without word alignment. Current hybrid HMM and CTC (Connectionist temporal classification) based models are proposed to solve frame or word level alignment. They may fail to tackle the cases with messing word order corresponding to visual content in sentences. To solve the issue, this paper proposes a hierarchical-LSTM (HLSTM) encoder-decoder model with visual content and word embedding for SLT. It tackles different granularities by conveying spatio-temporal transitions among frames, clips and viseme units. It firstly explores spatio-temporal cues of video clips by 3D CNN and packs appropriate visemes by online key clip mining with adaptive variable-length. After pooling on recurrent outputs of the top layer of HLSTM, a temporal attention-aware weighting mechanism is proposed to balance the intrinsic relationship among viseme source positions. At last, another two LSTM layers are used to separately recurse viseme vectors and translate semantic.