Plotting

 Liu, Yang


Bilingual Lexicon Induction from Non-Parallel Data with Minimal Supervision

AAAI Conferences

Building bilingual lexica from non-parallel data is a long-standing natural language processing research problem that could benefit thousands of resource-scarce languages which lack parallel data. Recent advances of continuous word representations have opened up new possibilities for this task, e.g. by establishing cross-lingual mapping between word embeddings via a seed lexicon. The method is however unreliable when there are only a limited number of seeds, which is a reasonable setting for resource-scarce languages. We tackle the limitation by introducing a novel matching mechanism into bilingual word representation learning. It captures extra translation pairs exposed by the seeds to incrementally improve the bilingual word embeddings. In our experiments, we find the matching mechanism to substantially improve the quality of the bilingual vector space, which in turn allows us to induce better bilingual lexica with seeds as few as 10.


A Bandit Framework for Strategic Regression

Neural Information Processing Systems

We consider a learner's problem of acquiring data dynamically for training a regression model, where the training data are collected from strategic data sources. A fundamental challenge is to incentivize data holders to exert effort to improve the quality of their reported data, despite that the quality is not directly verifiable by the learner. In this work, we study a dynamic data acquisition process where data holders can contribute multiple times. Using a bandit framework, we leverage on the long-term incentive of future job opportunities to incentivize high-quality contributions. We propose a Strategic Regression-Upper Confidence Bound (SR-UCB) framework, an UCB-style index combined with a simple payment rule, where the index of a worker approximates the quality of his past contributions and is used by the learner to determine whether the worker receives future work. For linear regression and certain family of non-linear regression problems, we show that SR-UCB enables a $O(\sqrt{\log T/T})$-Bayesian Nash Equilibrium (BNE) where each worker exerting a target effort level that the learner has chosen, with $T$ being the number of data acquisition stages. The SR-UCB framework also has some other desirable properties: (1) The indexes can be updated in an online fashion (hence computationally light). (2) A slight variant, namely Private SR-UCB (PSR-UCB), is able to preserve $(O(\log^{-1} T), O(\log^{-1} T))$-differential privacy for workers' data, with only a small compromise on incentives (achieving $O(\log^{6} T/\sqrt{T})$-BNE).


Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening

arXiv.org Machine Learning

We propose a novel training algorithm for reinforcement learning which combines the strength of deep Q-learning with a constrained optimization approach to tighten optimality and encourage faster reward propagation. Our novel technique makes deep reinforcement learning more practical by drastically reducing the training time. We evaluate the performance of our approach on the 49 games of the challenging Arcade Learning Environment, and report significant improvements in both training time and accuracy.


To Swap or Not to Swap? Exploiting Dependency Word Pairs for Reordering in Statistical Machine Translation

AAAI Conferences

Reordering poses a major challenge in machine translation (MT) between two languages with significant differences in word order. In this paper, we present a novel reordering approach utilizing sparse features based on dependency word pairs. Each instance of these features captures whether two words, which are related by a dependency link in the source sentence dependency parse tree, follow the same order or are swapped in the translation output. Experiments on Chinese-to-English translation show a statistically significant improvement of 1.21 BLEU point using our approach, compared to a state-of-the-art statistical MT system that incorporates prior reordering approaches.


Finding One's Best Crowd: Online Learning By Exploiting Source Similarity

AAAI Conferences

We consider an online learning problem (classification or prediction) involving disparate sources of sequentially arriving data, whereby a user over time learns the best set of data sources to use in constructing the classifier by exploiting their similarity. We first show that, when (1) the similarity information among data sources is known, and (2) data from different sources can be acquired without cost, then a judicious selection of data from different sources can effectively enlarge the training sample size compared to using a single data source, thereby improving the rate and performance of learning; this is achieved by bounding the classification error of the resulting classifier. We then relax assumption (1) and characterize the loss in learning performance when the similarity information must also be acquired through repeated sampling. We further relax both (1) and (2) and present a cost-efficient algorithm that identifies a best crowd from a potentially large set of data sources in terms of both classifier performance and data acquisition cost. This problem has various applications, including online prediction systems with time series data of various forms, such as financial markets, advertisement and network measurement.


Is It Harmful When Advisors Only Pretend to Be Honest?

AAAI Conferences

In trust systems, unfair rating attacks — where advisors provide ratings dishonestly — influence the accuracy of trust evaluation. A secure trust system should function properly under all possible unfair rating attacks; including dynamic attacks. In the literature, camouflage attacks are the most studied dynamic attacks. But an open question is whether more harmful dynamic attacks exist. We propose random processes to model and measure dynamic attacks. The harm of an attack is influenced by a user's ability to learn from the past. We consider three types of users: blind users, aware users, and general users. We found for all the three types, camouflage attacks are far from the most harmful. We identified the most harmful attacks, under which we found the ratings may still be useful to users.


Building Earth Mover's Distance on Bilingual Word Embeddings for Machine Translation

AAAI Conferences

Following their monolingual counterparts, bilingual word embeddings are also on the rise. As a major application task, word translation has been relying on the nearest neighbor to connect embeddings cross-lingually. However, the nearest neighbor strategy suffers from its inherently local nature and fails to cope with variations in realistic bilingual word embeddings. Furthermore, it lacks a mechanism to deal with many-to-many mappings that often show up across languages. We introduce Earth Mover's Distance to this task by providing a natural formulation that translates words in a holistic fashion, addressing the limitations of the nearest neighbor. We further extend the formulation to a new task of identifying parallel sentences, which is useful for statistical machine translation systems, thereby expanding the application realm of bilingual word embeddings. We show encouraging performance on both tasks.


Implicit Discourse Relation Classification via Multi-Task Neural Networks

AAAI Conferences

Without discourse connectives, classifying implicit discourse relations is a challenging task and a bottleneck for building a practical discourse parser. Previous research usually makes use of one kind of discourse framework such as PDTB or RST to improve the classification performance on discourse relations. Actually, under different discourse annotation frameworks, there exist multiple corpora which have internal connections. To exploit the combination of different discourse corpora, we design related discourse classification tasks specific to a corpus, and propose a novel Convolutional Neural Network embedded multi-task learning system to synthesize these tasks by learning both unique and shared representations for each task. The experimental results on the PDTB implicit discourse relation classification task demonstrate that our model achieves significant gains over baseline systems.


Quantifying Robustness of Trust Systems against Collusive Unfair Rating Attacks Using Information Theory

AAAI Conferences

Unfair rating attacks happen in existing trust and reputation systems, lowering the quality of the systems. There exists a formal model that measures the maximum impact of independent attackers [Wang et al., 2015] — based on information theory. We improve on these results in multiple ways: (1) we alter the methodology to be able to reason about colluding attackers as well, and (2) we extend the method to be able to measure the strength of any attacks (rather than just the strongest attack). Using (1), we identify the strongest collusion attacks, helping construct robust trust system. Using (2), we identify the strength of (classes of) attacks that we found in the literature. Based on this, we help to overcome a shortcoming of current research into collusion-resistance — specific (types of) attacks are used in simulations, disallowing direct comparisons between analyses of systems.


Iterative Learning of Parallel Lexicons and Phrases from Non-Parallel Corpora

AAAI Conferences

While parallel corpora are an indispensable resource for data-driven multilingual natural language processing tasks such as machine translation, they are limited in quantity, quality and coverage. As a result, learning translation models from non-parallel corpora has become increasingly important nowadays, especially for low-resource languages. In this work, we propose a joint model for iteratively learning parallel lexicons and phrases from nonparallel corpora. The model is trained using a Viterbi EM algorithm that alternates between constructing parallel phrases using lexicons and updating lexicons based on the constructed parallel phrases. Experiments on Chinese-English datasets show that our approach learns better parallel lexicons and phrases and improves translation performance significantly.