Goto

Collaborating Authors

 Xiamen University


Asynchronous Bidirectional Decoding for Neural Machine Translation

AAAI Conferences

The dominant neural machine translation (NMT) models apply unified attentional encoder-decoder neural networks for translation. Traditionally, the NMT decoders adopt recurrent neural networks (RNNs) to perform translation in a left-to-right manner, leaving the target-side contexts generated from right to left unexploited during translation. In this paper, we equip the conventional attentional encoder-decoder NMT framework with a backward decoder, in order to explore bidirectional decoding for NMT. Attending to the hidden state sequence produced by the encoder, our backward decoder first learns to generate the target-side hidden state sequence from right to left. Then, the forward decoder performs translation in the forward direction, while in each translation prediction timestep, it simultaneously applies two attention models to consider the source-side and reverse target-side hidden states, respectively. With this new architecture, our model is able to fully exploit source- and target-side contexts to improve translation quality altogether. Experimental results on NIST Chinese-English and WMT English-German translation tasks demonstrate that our model achieves substantial improvements over the conventional NMT by 3.14 and 1.38 BLEU points, respectively. The source code of this work can be obtained from https://github.com/DeepLearnXMU/ABDNMT.


Compressed Sensing MRI Using a Recursive Dilated Network

AAAI Conferences

Compressed sensing magnetic resonance imaging (CS-MRI) is an active research topic in the ๏ฌeld of inverse problems. Conventional CS-MRI algorithms usually exploit the sparse nature of MRI in an iterative manner. These optimization-based CS-MRI methods are often time-consuming at test time, and are based on ๏ฌxed transform bases or shallow dictionaries, which limits modeling capacity. Recently, deep models have been introduced to the CS-MRI problem. One main challenge for CS-MRI methods based on deep learning is the trade off between model performance and network size. We propose a recursive dilated network (RDN) for CS-MRI that achieves good performance while reducing the number of network parameters. We adopt dilated convolutions in each recursive block to aggregate multi-scale information within the MRI. We also adopt a modi๏ฌed shortcut strategy to help features ๏ฌ‚ow into deeper layers. Experimental results show that the proposed RDN model achieves state-of-the-art performance in CS-MRI while using far fewer parameters than previously required.


Deep Semantic Role Labeling With Self-Attention

AAAI Conferences

Semantic Role Labeling (SRL) is believed to be a crucial step towards natural language understanding and has been widely studied. Recent years, end-to-end SRL with recurrent neural networks (RNN) has gained increasing attention. However, it remains a major challenge for RNNs to handle structural information and long range dependencies. In this paper, we present a simple and effective architecture for SRL which aims to address these problems. Our model is based on self-attention which can directly capture the relationships between two tokens regardless of their distance. Our single model achieves F1=83.4 on the CoNLL-2005 shared task dataset and F1=82.7 on the CoNLL-2012 shared task dataset, which outperforms the previous state-of-the-art results by 1.8 and 1.0 F1 score respectively. Besides, our model is computationally efficient, and the parsing speed is 50K tokens per second on a single Titan X GPU.


Variational Recurrent Neural Machine Translation

AAAI Conferences

Partially inspired by successful applications of variational recurrent neural networks, we propose a novel variational recurrent neural machine translation (VRNMT) model in this paper. Different from the variational NMT, VRNMT introduces a series of latent random variables to model the translation procedure of a sentence in a generative way, instead of a single latent variable. Specifically, the latent random variables are included into the hidden states of the NMT decoder with elements from the variational autoencoder. In this way, these variables are recurrently generated, which enables them to further capture strong and complex dependencies among the output translations at different timesteps. In order to deal with the challenges in performing efficient posterior inference and large-scale training during the incorporation of latent variables, we build a neural posterior approximator, and equip it with a reparameterization technique to estimate the variational lower bound. Experiments on Chinese-English and English-German translation tasks demonstrate that the proposed model achieves significant improvements over both the conventional and variational NMT models.


BattRAE: Bidimensional Attention-Based Recursive Autoencoders for Learning Bilingual Phrase Embeddings

AAAI Conferences

In this paper, we propose a bidimensional attention based recursiveautoencoder (BattRAE) to integrate clues and sourcetargetinteractions at multiple levels of granularity into bilingualphrase representations. We employ recursive autoencodersto generate tree structures of phrases with embeddingsat different levels of granularity (e.g., words, sub-phrases andphrases). Over these embeddings on the source and targetside, we introduce a bidimensional attention network to learntheir interactions encoded in a bidimensional attention matrix,from which we extract two soft attention weight distributionssimultaneously. These weight distributions enableBattRAE to generate compositive phrase representations viaconvolution. Based on the learned phrase representations, wefurther use a bilinear neural model, trained via a max-marginmethod, to measure bilingual semantic similarity. To evaluatethe effectiveness of BattRAE, we incorporate this semanticsimilarity as an additional feature into a state-of-the-art SMTsystem. Extensive experiments on NIST Chinese-English testsets show that our model achieves a substantial improvementof up to 1.63 BLEU points on average over the baseline.


Lattice-Based Recurrent Neural Network Encoders for Neural Machine Translation

AAAI Conferences

Neural machine translation (NMT) heavily relies on word-level modelling to learn semantic representations of input sentences.However, for languages without natural word delimiters (e.g., Chinese) where input sentences have to be tokenized first,conventional NMT is confronted with two issues:1) it is difficult to find an optimal tokenization granularity for source sentence modelling, and2) errors in 1-best tokenizations may propagate to the encoder of NMT.To handle these issues, we propose word-lattice based Recurrent Neural Network (RNN) encoders for NMT,which generalize the standard RNN to word lattice topology.The proposed encoders take as input a word lattice that compactly encodes multiple tokenizations, and learn to generate new hidden states from arbitrarily many inputs and hidden states in preceding time steps.As such, the word-lattice based encoders not only alleviate the negative impact of tokenization errors but also are more expressive and flexible to embed input sentences.Experiment results on Chinese-English translation demonstrate the superiorities of the proposed encoders over the conventional encoder.


ESPACE: Accelerating Convolutional Neural Networks via Eliminating Spatial and Channel Redundancy

AAAI Conferences

Recent years have witnessed an extensive popularity of convolutional neural networks (CNNs) in various computer vision and artificial intelligence applications. However, the performance gains have come at a cost of substantially intensive computation complexity, which prohibits its usage inresource-limited applications like mobile or embedded devices. While increasing attention has been paid to the acceleration of internal network structure, the redundancy of visual input is rarely considered. In this paper, we make the first attempt of reducing spatial and channel redundancy directly from the visual input for CNNs acceleration. The proposed method, termed ESPACE (Elimination of SPAtial and Channel rEdundancy), works by the following three steps: First, the 3D channel redundancy of convolutional layers is reduced by a set of low-rank approximation of convolutional filters. Second, a novel mask based selective processing scheme is proposed, which further speedups the convolution operations via skipping unsalient spatial locations of the visual input. Third, the accelerated network is fine-tuned using the training data via back-propagation. The proposed method is evaluated on ImageNet 2012 with implementations on two widely adopted CNNs, i.e. AlexNet and GoogLeNet. In comparison to several recent methods of CNN acceleration, the proposed scheme has demonstrated new state-of-the-art acceleration performance by a factor of 5.48* and 4.12* speedup on AlexNet and GoogLeNet, respectively, with a minimal decrease in classification accuracy.


Auto-Annotation of 3D Objects via ImageNet

AAAI Conferences

Automatic annotation of 3D objects in cluttered scenes shows its great importance to a variety of applications. Nowadays, 3D point clouds, a new 3D representation of real-world objects, can be easily and rapidly collected by mobile LiDAR systems, e.g. RIEGL VMX-450 system. Moreover, the mobile LiDAR system can also provide a series of consecutive multi-view images which are calibrated with 3D point clouds. This paper proposes to automatically annotate 3D objects of interest in point clouds of road scenes by exploiting a multitude of annotated images in image databases, such as LabelMe and ImageNet. In the proposed method, an object detector trained on the annotated images is used to locate the object regions in acquired multi-view images. Then, based on the correspondences between multi-view images and 3D point clouds, a probabilistic graphical model is used to model the temporal, spatial and geometric constraints to extract the 3D objects automatically. A new dataset was built for evaluation and the experimental results demonstrate a satisfied performance on 3D object extraction.


Ordinal Constrained Binary Code Learning for Nearest Neighbor Search

AAAI Conferences

Recent years have witnessed extensive attention in binary code learning, a.k.a. hashing, for nearest neighbor search problems. It has been seen that high-dimensional data points can quantize into binary codes to give an efficient similarity approximation via Hamming distance. Among the existing schemes, ranking-based hashing is recent promising that targets at preserving ordinal relations of ranking in the Hamming space to minimize retrieval loss. However, the size of the ranking tuples that show the ordinal relations, is quadratic or cubic to the size of training samples. It is so very expensive to embed such ranking tuples in binary code learning, especially given a large-scale training data set. Besides, it remains difficult to build ranking tuples efficiently for most ranking-preserving hashing, which are deployed over an ordinal graph-based setting. To handle these problems, we propose a novel ranking-preserving hashing method, dubbed Ordinal Constraint Hashing (OCH), which efficiently learns the optimal hashing functions with a graph-based approximation to embed the ordinal relations. The core idea is to reduce the size of ordinal graph with ordinal constraint projection, which preserves the ordinal relations through a small data set (such as clusters or random samples). In particular, to learn such hash functions effectively, we further relax the discrete constraints and design a specific stochastic gradient decent algorithm for optimization. Experimental results on three large-scale visual search benchmark datasets, i.e. LabelMe, Tiny100K and GIST1M, show that the proposed OCH method can achieve superior performance over the state-of-the-arts approaches.


Towards Domain Adaptive Vehicle Detection in Satellite Image by Supervised Super-Resolution Transfer

AAAI Conferences

Vehicle detection in satellite image has attracted extensive research attentions with various emerging applications.However, the detector performance has been significantly degenerated due to the low resolutions of satellite images, as well as the limited training data.In this paper, a robust domain-adaptive vehicle detection framework is proposed to bypass both problems.Our innovation is to transfer the detector learning to the high-resolution aerial image domain,where rich supervision exists and robust detectors can be trained.To this end, we first propose a super-resolution algorithm using coupled dictionary learning to ``augment'' the satellite image region being tested into the aerial domain.Notably, linear detection loss is embedded into the dictionary learning, which enforces the augmented region to be sensitive to the subsequent detector training.Second, to cope with the domain changes, we propose an instance-wised detection using Exemplar Support Vector Machines (E-SVMs), which well handles the intra-class and imaging variations like scales, rotations, and occlusions.With comprehensive experiments on large-scale satellite image collections, we demonstrate that the proposed framework can significantly boost the detection accuracy over several state-of-the-arts.