The University of Hong Kong
Search Engine Guided Neural Machine Translation
Gu, Jiatao (The University of Hong Kong) | Wang, Yong (The University of Hong Kong) | Cho, Kyunghyun (New York University) | Li, Victor O.K. (The University of Hong Kong)
Neural machine translation is a recently proposed paradigm A major technical challenge, other than designing such a in machine translation, where a single neural network, often neural machine translation system, is the scale of a training consisting of encoder and decoder recurrent networks, parallel corpus which often consists of hundreds of thousands is trained end-to-end to map from a source sentence to its to millions of sentence pairs. We address this issue by incorporating corresponding translation(Bahdanau, Cho, and Bengio 2014; an off-the-shelf black-box search engine into the Cho et al. 2014; Sutskever, Vinyals, and Le 2014; Kalchbrenner proposed neural machine translation system. The proposed and Blunsom 2013). The success of neural machine approach first queries a search engine, which indexes a whole translation, which has already been adopted by major training set, with a given source sentence, and the proposed industry players in machine translation(Wu et al. 2016; neural translation system translates the source sentence while Crego et al. 2016), is often attributed to the advances in building incorporating all the retrieved training sentence pairs. In this and training recurrent networks as well as the availability way, the proposed translation system automatically adapts to of large-scale parallel corpora for machine translation.
Neural Machine Translation with Gumbel-Greedy Decoding
Gu, Jiatao (The University of Hong Kong) | Im, Daniel Jiwoong (AIFounded Inc.) | Li, Victor O.K. (The University of Hong Kong)
Previous neural machine translation models used some heuristic search algorithms (e.g., beam search) in order to avoid solving the maximum a posteriori problem over translation sentences at test phase. In this paper, we propose the \textit{Gumbel-Greedy Decoding} which trains a generative network to predict translation under a trained model. We solve such a problem using the Gumbel-Softmax reparameterization, which makes our generative network differentiable and trainable through standard stochastic gradient methods. We empirically demonstrate that our proposed model is effective for generating sequences of discrete words.
Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition
Liu, Wei (The University of Hong Kong) | Chen, Chaofeng (The University of Hong Kong) | Wong, Kwan-Yee K. (The University of Hong Kong)
In this paper, we present a Character-Aware Neural Network (Char-Net) for recognizing distorted scene text. Our Char-Net is composed of a word-level encoder, a character-level encoder, and a LSTM-based decoder. Unlike previous work which employed a global spatial transformer network to rectify the entire distorted text image, we take an approach of detecting and rectifying individual characters. To this end, we introduce a novel hierarchical attention mechanism (HAM) which consists of a recurrent RoIWarp layer and a character-level attention layer. The recurrent RoIWarp layer sequentially extracts a feature region corresponding to a character from the feature map produced by the word-level encoder, and feeds it to the character-level encoder which removes the distortion of the character through a simple spatial transformer and further encodes the character region. The character-level attention layer then attends to the most relevant features of the feature map produced by the character-level encoder and composes a context vector, which is finally fed to the LSTM-based decoder for decoding. This approach of adopting a simple local transformation to model the distortion of individual characters not only results in an improved efficiency, but can also handle different types of distortion that are hard, if not impossible, to be modelled by a single global transformation. Experiments have been conducted on six public benchmark datasets. Our results show that Char-Net can achieve state-of-the-art performance on all the benchmarks, especially on the IC-IST which contains scene text with large distortion. Code will be made available.
Zero-Resource Neural Machine Translation with Multi-Agent Communication Game
Chen, Yun (The University of Hong Kong) | Liu, Yang (Tsinghua University) | Li, Victor O.K. (The University of Hong Kong)
While end-to-end neural machine translation (NMT) has achieved notable success in the past years in translating a handful of resource-rich language pairs, it still suffers from the data scarcity problem for low-resource language pairs and domains. To tackle this problem, we propose an interactive multimodal framework for zero-resource neural machine translation. Instead of being passively exposed to large amounts of parallel corpora, our learners (implemented as encoder-decoder architecture) engage in cooperative image description games, and thus develop their own image captioning or neural machine translation model from the need to communicate in order to succeed at the game. Experimental results on the IAPR-TC12 and Multi30K datasets show that the proposed learning mechanism significantly improves over the state-of-the-art methods.
On Multi-Relational Link Prediction With Bilinear Models
Wang, Yanjie (University of Mannheim) | Gemulla, Rainer (University of Mannheim) | Li, Hui (The University of Hong Kong)
We study bilinear embedding models for the task of multi-relational link prediction and knowledge graph completion. Bilinear models belong to the most basic models for this task, they are comparably efficient to train and use, and they can provide good prediction performance. The main goal of this paper is to explore the expressiveness of and the connections between various bilinear models proposed in the literature. In particular, a substantial number of models can be represented as bilinear models with certain additional constraints enforced on the embeddings. We explore whether or not these constraints lead to universal models, which can in principle represent every set of relations, and whether or not there are subsumption relationships between various models. We report results of an independent experimental study that evaluates recent bilinear models in a common experimental setup. Finally, we provide evidence that relation-level ensembles of multiple bilinear models can achieve state-of-the-art prediction performance.
Attention Based LSTM for Target Dependent Sentiment Classification
Yang, Min (The University of Hong Kong) | Tu, Wenting (The University of Hong Kong) | Wang, Jingxuan (The University of Hong Kong) | Xu, Fei (Chinese Academy of Sciences) | Chen, Xiaojun (Shenzhen University)
We present an attention-based bidirectional LSTM approach to improve the target-dependent sentiment classification. Our method learns the alignment between the target entities and the most distinguishing features. We conduct extensive experiments on a real-life dataset. The experimental results show that our model achieves state-of-the-art results.
Authorship Attribution with Topic Drift Model
Yang, Min (The University of Hong Kong) | Zhu, Dingju (South China Normal University) | Tang, Yong (South China Normal University) | Wang, Jingxuan (The University of Hong Kong)
Authorship attribution is an active research direction due to its legal and financial importance. The goal is to identify the authorship of anonymous texts. In this paper, we propose a Topic Drift Model (TDM), monitoring the dynamicity of authorsโ writing style and latent topics of interest. Our model is sensitive to the temporal information and the ordering of words, thus it extracts more information from texts.
Detecting Review Spammer Groups
Yang, Min (The University of Hong Kong) | Lu, Ziyu (The University of Hong Kong) | Chen, Xiaojun (Shenzhen University) | Xu, Fei ( Chinese Academy of Sciences )
With an increasing number of paid writers posting fake reviews to promote or demote some target entities through Internet, review spammer detection has become a crucial and challenging task. In this paper, we propose a three-phase method to address the problem of identifying review spammer groups and individual spammers, who get paid for posting fake comments. We evaluate the effectiveness and performance of the approach on a real-life online shopping review dataset from amazon.com. The experimental result shows that our model achieved comparable or better performance than previous work on spammer detection.
Efficient Delivery Policy to Minimize User Traffic Consumption in Guaranteed Advertising
Zhang, Jia (Chinese Academy of Sciences and University of Chinese Academy of Sciences) | Wang, Zheng (The University of Hong Kong) | Li, Qian (Chinese Academy of Sciences and University of Chinese Academy of Sciences) | Zhang, Jialin (Chinese Academy of Sciences and University of Chinese Academy of Sciences) | Lan, Yanyan (Chinese Academy of Sciences and University of Chinese Academy of Sciences) | Li, Qiang (Chinese Academy of Sciences and University of Chinese Academy of Sciences) | Sun, Xiaoming (CAS Key Lab of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences University of Chinese Academy of Sciences)
In this work, we study the guaranteed delivery model which is widely used in online advertising. In the guaranteed delivery scenario, ad exposures (which are also called impressions in some works) to users are guaranteed by contracts signed in advance between advertisers and publishers. A crucial problem for the advertising platform is how to fully utilize the valuable user traffic to generate as much as possible revenue. Different from previous works which usually minimize the penalty of unsatisfied contracts and some other cost (e.g. representativeness), we propose the novel consumption minimization model, in which the primary objective is to minimize the user traffic consumed to satisfy all contracts. Under this model, we develop a near optimal method to deliver ads for users. The main advantage of our method lies in that it consumes nearly as least as possible user traffic to satisfy all contracts, therefore more contracts can be accepted to produce more revenue. It also enables the publishers to estimate how much user traffic is redundant or short so that they can sell or buy this part of traffic in bulk in the exchange market. Furthermore, it is robust with regard to priori knowledge of user type distribution. Finally, the simulation shows that our method outperforms the traditional state-of-the-art methods.
Exploring Efficient Strategies for Minesweeper
Tu, Jinzheng (Tsinghua University) | Li, Tianhong (Tsinghua University) | Chen, Shiteng (Institute of Software, Chinese Academy of Sciences) | Zu, Chong (University of California, Berkeley) | Gu, Zhaoquan (The University of Hong Kong)
Minesweeper is a famous single-player computer game, in which the grid of blocks contains some mines and the player is to uncover (probe) all blocks that do not contain any mines. Many heuristic strategies have been prompted to play the game, but the rate of success is not high. In this paper, we explore efficient strategies for the Minesweeper game. First, we show a counterintuitive result that probing the corner blocks could increase the rate of success. Then, we present a series of heuristic strategies, and the combination of them could lead to better results. We also transplant the optimal procedure on the basis of our proposed methods, and it achieves the highest rate of success. Through extensive simulations, a combination of heuristic strategies, "PSEQ", yields a success rate of 81.627(8)%, 78.122(8)%, and 39.616(5)% for beginner, intermediate, and expert levels respectively, outperforming the state-of-the-art strategies. Moreover, the developed quasi-optimal methods, combining the optimal procedure and our heuristic methods, raise the success rate to at least 81.79(2)%, 78.22(3)%, and 40.06(2)% respectively.