Goto

Collaborating Authors

 Machine Translation


XL-Editor: Post-editing Sentences with XLNet

arXiv.org Machine Learning

While neural sequence generation models achieve initial su c-cess for many NLP applications, the canonical decoding procedure with left-to-right generation order (i.e., autoreg res-sive) in one-pass can not reflect the true nature of human revising a sentence to obtain a refined result. In this work, we propose XL-Editor, a novel training framework that enables state-of-the-art generalized autoregressive pretrainin g methods, XLNet specifically, to revise a given sentence by the variable-length insertion probability. Concretely, XL-E ditor can (1) estimate the probability of inserting a variable-le ngth sequence into a specific position of a given sentence; (2) execute post-editing operations such as insertion, deletion, and replacement based on the estimated variable-length insert ion probability; (3) complement existing sequence-to-sequen ce models to refine the generated sequences. Empirically, we first demonstrate better post-editing capabilities of XL-E ditor over XLNet on the text insertion and deletion tasks, which validates the effectiveness of our proposed framework. Fur - thermore, we extend XL-Editor to the unpaired text style transfer task, where transferring the target style onto a gi ven sentence can be naturally viewed as post-editing the senten ce into the target style. XL-Editor achieves significant impro ve-ment in style transfer accuracy and also maintains coherent semantic of the original sentence, showing the broad applic ability of our method.


Facebook makes big advances in AI reasoning and machine translation - SiliconANGLE

#artificialintelligence

Facebook Inc. is using its @Scale conference today to provide an update on its progress in artificial intelligence research. The social media company is open-sourcing a new "AI reasoning" platform and providing some updates on its research into machine translation. It's part of a broad push to scale up AI workloads, a difficult task given the massive amounts of data needed to train AI models, Srinivas Narayanan (pictured), the lead for Facebook's Applied AI Research, said this morning at the conference in San Jose, California. "Facebook wouldn't be where it is today without AI," Narayanan said. "It's deeply integrated into everything we do."


A language processing algorithm for predicting tactical solutions to an operational planning problem under uncertainty

arXiv.org Machine Learning

This paper is devoted to the prediction of solutions to a stochastic discrete optimization problem. Through an application, we illustrate how we can use a state-of-the-art neural machine translation (NMT) algorithm to predict the solutions by defining appropriate vocabularies, syntaxes and constraints. We attend to applications where the predictions need to be computed in very short computing time -- in the order of milliseconds or less. The results show that with minimal adaptations to the model architecture and hyperparameter tuning, the NMT algorithm can produce accurate solutions within the computing time budget. While these predictions are slightly less accurate than approximate stochastic programming solutions (sample average approximation), they can be computed faster and with less variability.


Overcoming the Rare Word Problem for Low-Resource Language Pairs in Neural Machine Translation

arXiv.org Machine Learning

Among the six challenges of neural machine translation (NMT) coined by ( Koehn and Knowles, 2017), rare-word problem is considered the most severe one, especially in translation of low-resource languages. In this paper, we propose three solutions to address the rare words in neural machine translation systems. First, we enhance source context to predict the target words by connecting directly the source embeddings to the output of the attention component in NMT. Second, we propose an algorithm to learn morphology of unknown words for English in supervised way in order to minimize the adverse effect of rare-word problem. Finally, we exploit synonymous relation from the W ordNet to overcome out-of-vocabulary (OOV) problem of NMT. W e evaluate our approaches on two low-resource language pairs: English-Vietnamese and Japanese-Vietnamese. In our experiments, we have achieved significant improvements of up to roughly 1.0 BLEU points in both language pairs.


Translation by the numbers: Facebook AI puts words into multidimensional spaces

The Japan Times

PARIS โ€“ Designers of machine translation tools still mostly rely on dictionaries to make a foreign language understandable. But now there is a new way: numbers. Facebook researchers say rendering words into figures and exploiting mathematical similarities between languages is a promising avenue -- even if a universal communicator as seen in "Star Trek" remains a distant dream. Powerful automatic translation is a big priority for internet giants. Allowing as many people as possible worldwide to communicate is not just an altruistic goal, but also good business.


Lost in Translation?

#artificialintelligence

Fueled by improvements in speech recognition, machine learning, better algorithms, cloud processing, and more powerful computing devices, the quality of machine translations is improving. Learning another language has never been a simple proposition. It can take months of study to absorb the basics and years to become fluent. Of course, there's the added headache that learning a language doesn't help if a person encounters one of the world's other 7,000 or so languages. "There has always been a need for human translators and interpreters," says Andrew Ochoa, CEO of translation technology firm Waverly Labs.


MLQA: Evaluating Cross-lingual Extractive Question Answering

arXiv.org Artificial Intelligence

Question answering (QA) models have shown rapid progress enabled by the availability of large, high-quality benchmark datasets. Such annotated datasets are difficult and costly to collect, and rarely exist in languages other than English, making training QA systems in other languages challenging. An alternative to building large monolingual training datasets is to develop cross-lingual systems which can transfer to a target language without requiring training data in that language. In order to develop such systems, it is crucial to invest in high quality multilingual evaluation benchmarks to measure progress. We present MLQA, a multi-way aligned extractive QA evaluation benchmark intended to spur research in this area. MLQA contains QA instances in 7 languages, namely English, Arabic, German, Spanish, Hindi, Vietnamese and Simplified Chinese. It consists of over 12K QA instances in English and 5K in each other language, with each QA instance being parallel between 4 languages on average. MLQA is built using a novel alignment context strategy on Wikipedia articles, and serves as a cross-lingual extension to existing extractive QA datasets. We evaluate current state-of-the-art cross-lingual representations on MLQA, and also provide machine-translation-based baselines. In all cases, transfer results are shown to be significantly behind training-language performance.


Fully Quantized Transformer for Improved Translation

arXiv.org Machine Learning

A BSTRACT State-of-the-art neural machine translation methods employ massive amounts of parameters. Drastically reducing computational costs of such methods without affecting performance has been up to this point unsolved. In this work, we propose a quantization strategy tailored to the Transformer (V aswani et al., 2017) architecture. We evaluate our method on the WMT14 EN-FR and WMT14 EN-DE translation tasks and achieve state-of-the-art quantization results for the Transformer, obtaining no loss in BLEU scores compared to the non-quantized baseline. We further compress the Transformer by showing that, once the model is trained, a good portion of the nodes in the encoder can be removed without causing any loss in BLEU. 1 I NTRODUCTION Neural machine translation methods have achieved impressive results lately (Ahmed et al., 2017; Ott et al., 2018; Edunov et al., 2018). Having been proposed only recently (Kalchbrenner & Blunsom, 2013; Sutskever et al., 2014; Cho et al., 2014), many great work have led the field to move forward quickly. Bahdanau et al. (2014) introduced an attention mechanism, allowing the decoder to attend to any hidden state generated by the encoder. Multiple improvements to their approach have been proposed, such as multiplicative attention (Luong et al., 2015) and more recently multi-head self-attention (V aswani et al., 2017).


Root Mean Square Layer Normalization

arXiv.org Machine Learning

Layer normalization (LayerNorm) has been successfully applied to various deep neural networks to help stabilize training and boost model convergence because of its capability in handling re-centering and re-scaling of both inputs and weight matrix. However, the computational overhead introduced by LayerNorm makes these improvements expensive and significantly slows the underlying network, e.g. RNN in particular. In this paper, we hypothesize that re-centering invariance in LayerNorm is dispensable and propose root mean square layer normalization, or RMSNorm. RMSNorm regularizes the summed inputs to a neuron in one layer according to root mean square (RMS), giving the model re-scaling invariance property and implicit learning rate adaptation ability. RMSNorm is computationally simpler and thus more efficient than LayerNorm. We also present partial RMSNorm, or pRMSNorm where the RMS is estimated from p% of the summed inputs without breaking the above properties. Extensive experiments on several tasks using diverse network architectures show that RMSNorm achieves comparable performance against LayerNorm but reduces the running time by 7%~64% on different models. Source code is available at https://github.com/bzhangGo/rmsnorm.


AI could be a force for good โ€“ but we're currently heading for a darker future

#artificialintelligence

Artificial Intelligence (AI) is already re-configuring the world in conspicuous ways. Data drives our global digital ecosystem, and AI technologies reveal patterns in data. Smartphones, smart homes, and smart cities influence how we live and interact, and AI systems are increasingly involved in recruitment decisions, medical diagnoses, and judicial verdicts. Whether this scenario is utopian or dystopian depends on your perspective. The potential risks of AI are enumerated repeatedly.