AITopics | Machine Translation

Collaborating Authors

Machine Translation

"Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful in a number of specific domains."
– Definition from the European Association for Machine Translation (EAMT).

You can translate text of your choice by using free translators such as: CAPITA, Google Translate, SDL International, SYSTRAN.

News Overviews Instructional Materials AI-Alerts Classics

Can Active Memory Replace Attention?

Kaiser, Łukasz, Bengio, Samy

Neural Information Processing SystemsFeb-14-2020, 14:43:25 GMT

Several mechanisms to focus attention of a neural network on selected parts of its input or memory have been used successfully in deep learning models in recent years. Attention has improved image classification, image captioning, speech recognition, generative models, and learning algorithmic tasks, but it had probably the largest impact on neural machine translation. Recently, similar improvements have been obtained using alternative mechanisms that do not focus on a single part of a memory but operate on all of it in parallel, in a uniform way. Such mechanism, which we call active memory, improved over attention in algorithmic tasks, image processing, and in generative modelling. So far, however, active memory has not improved over attention for most natural language processing tasks, in particular for machine translation.

active memory replace attention, machine translation, mechanism, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.63)

Add feedback

Deliberation Networks: Sequence Generation Beyond One-Pass Decoding

Xia, Yingce, Tian, Fei, Wu, Lijun, Lin, Jianxin, Qin, Tao, Yu, Nenghai, Liu, Tie-Yan

Neural Information Processing SystemsFeb-14-2020, 08:57:32 GMT

The encoder-decoder framework has achieved promising progress for many sequence generation tasks, including machine translation, text summarization, dialog system, image captioning, etc. Such a framework adopts an one-pass forward process while decoding and generating a sequence, but lacks the deliberation process: A generated sequence is directly used as final output without further polishing. However, deliberation is a common behavior in human's daily life like reading news and writing papers/articles/books. In this work, we introduce the deliberation process into the encoder-decoder framework and propose deliberation networks for sequence generation. A deliberation network has two levels of decoders, where the first-pass decoder generates a raw sequence and the second-pass decoder polishes and refines the raw sentence with deliberation.

deliberation network, one-pass decoding, sequence generation, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.66)

Add feedback

Generative Neural Machine Translation

Shah, Harshil, Barber, David

Neural Information Processing SystemsFeb-14-2020, 07:58:25 GMT

We introduce Generative Neural Machine Translation (GNMT), a latent variable architecture which is designed to model the semantics of the source and target sentences. We modify an encoder-decoder translation model by adding a latent variable as a language agnostic representation which is encouraged to learn the meaning of the sentence. GNMT achieves competitive BLEU scores on pure translation tasks, and is superior when there are missing words in the source sentence. We augment the model to facilitate multilingual translation and semi-supervised learning without adding parameters. This framework significantly reduces overfitting when there is limited paired data available, and is effective for translating between pairs of languages not seen during training.

generative neural machine translation

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Dual Learning for Machine Translation

He, Di, Xia, Yingce, Qin, Tao, Wang, Liwei, Yu, Nenghai, Liu, Tie-Yan, Ma, Wei-Ying

Neural Information Processing SystemsFeb-14-2020, 06:59:11 GMT

While neural machine translation (NMT) is making good progress in the past two years, tens of millions of bilingual sentence pairs are needed for its training. However, human labeling is very costly. To tackle this training data bottleneck, we develop a dual-learning mechanism, which can enable an NMT system to automatically learn from unlabeled data through a dual-learning game. This mechanism is inspired by the following observation: any machine translation task has a dual task, e.g., English-to-French translation (primal) versus French-to-English translation (dual); the primal and dual tasks can form a closed loop, and generate informative feedback signals to train the translation models, even if without the involvement of a human labeler. In the dual-learning mechanism, we use one agent to represent the model for the primal task and the other agent to represent the model for the dual task, then ask them to teach each other through a reinforcement learning process.

dual learning, machine translation, translation, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Decoding with Value Networks for Neural Machine Translation

He, Di, Lu, Hanqing, Xia, Yingce, Qin, Tao, Wang, Liwei, Liu, Tie-Yan

Neural Information Processing SystemsFeb-14-2020, 04:58:24 GMT

Neural Machine Translation (NMT) has become a popular technology in recent years, and beam search is its de facto decoding method due to the shrunk search space and reduced computational complexity. However, since it only searches for local optima at each time step through one-step forward looking, it usually cannot output the best target sentence. Inspired by the success and methodology of AlphaGo, in this paper we propose using a prediction network to improve beam search, which takes the source sentence $x$, the currently available decoding output $y_1,\cdots, y_{t-1}$ and a candidate word $w$ at step $t$ as inputs and predicts the long-term value (e.g., BLEU score) of the partial target sentence if it is completed by the NMT model. Following the practice in reinforcement learning, we call this prediction network \emph{value network}. Specifically, we propose a recurrent structure for the value network, and train its parameters from bilingual data.

decoding, neural machine translation, value network, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

CBAG: Conditional Biomedical Abstract Generation

Sybrandt, Justin, Safro, Ilya

arXiv.org Machine LearningFeb-13-2020

Biomedical research papers use significantly different language and jargon when compared to typical English text, which reduces the utility of pre-trained NLP models in this domain. Meanwhile Medline, a database of biomedical abstracts, introduces nearly a million new documents per-year. Applications that could benefit from understanding this wealth of publicly available information, such as scientific writing assistants, chat-bots, or descriptive hypothesis generation systems, require new domain-centered approaches. A conditional language model, one that learns the probability of words given some a priori criteria, is a fundamental building block in many such applications. We propose a transformer-based conditional language model with a shallow encoder "condition" stack, and a deep "language model" stack of multi-headed attention blocks. The condition stack encodes metadata used to alter the output probability distribution of the language model stack. We sample this distribution in order to generate biomedical abstracts given only a proposed title, an intended publication year, and a set of keywords. Using typical natural language generation metrics, we demonstrate that this proposed approach is more capable of producing non-trivial relevant entities within the abstract body than the 1.5B parameter GPT-2 language model.

arxiv preprint arxiv, language model, sequence, (13 more...)

arXiv.org Machine Learning

2002.05637

Country: North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > Experimental Study (0.68)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.94)
Health & Medicine > Consumer Health (0.93)
Education > Health & Safety > School Nutrition (0.68)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)

Add feedback

Deep Learning for Source Code Modeling and Generation: Models, Applications and Challenges

Le, Triet H. M., Chen, Hao, Babar, M. Ali

arXiv.org Artificial IntelligenceFeb-13-2020

Deep Learning (DL) techniques for Natural Language Processing have been evolving remarkably fast. Recently, the DL advances in language modeling, machine translation and paragraph understanding are so prominent that the potential of DL in Software Engineering cannot be overlooked, especially in the field of program learning. To facilitate further research and applications of DL in this field, we provide a comprehensive review to categorize and investigate existing DL methods for source code modeling and generation. To address the limitations of the traditional source code models, we formulate common program learning tasks under an encoder-decoder framework. After that, we introduce recent DL mechanisms suitable to solve such problems. Then, we present the state-of-the-art practices and discuss their challenges with some recommendations for practitioners and researchers as well.

arxiv preprint arxiv, neural network, source code, (12 more...)

arXiv.org Artificial Intelligence

2002.05442

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(6 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.45)

Industry:

Leisure & Entertainment (0.68)
Information Technology (0.67)
Education > Educational Setting > Online (0.67)
(2 more...)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
(6 more...)

Add feedback

Superbloom: Bloom filter meets Transformer

Anderson, John, Huang, Qingqing, Krichene, Walid, Rendle, Steffen, Zhang, Li

arXiv.org Machine LearningFeb-11-2020

We extend the idea of word pieces in natural language models to machine learning tasks on opaque ids. This is achieved by applying hash functions to map each id to multiple hash tokens in a much smaller space, similarly to a Bloom filter. We show that by applying a multi-layer Transformer to these Bloom filter digests, we are able to obtain models with high accuracy. They outperform models of a similar size without hashing and, to a large degree, models of a much larger size trained using sampled softmax with the same computational budget. Our key observation is that it is important to use a multi-layer Transformer for Bloom filter digests to remove ambiguity in the hashed input. We believe this provides an alternative method to solving problems with large vocabulary size.

experiment, superbloom, transformer, (15 more...)

arXiv.org Machine Learning

2002.04723

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > Florida > Pinellas County > St. Petersburg (0.14)
(25 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Add feedback

Improving Molecular Design by Stochastic Iterative Target Augmentation

Yang, Kevin, Jin, Wengong, Swanson, Kyle, Barzilay, Regina, Jaakkola, Tommi

arXiv.org Machine LearningFeb-11-2020

Generative models in molecular design tend to be richly parameterized, data-hungry neural models, as they must create complex structured objects as outputs. Estimating such models from data may be challenging due to the lack of sufficient training data. In this paper, we propose a surprisingly effective self-training approach for iteratively creating additional molecular targets. We first pre-train the generative model together with a simple property predictor. The property predictor is then used as a likelihood model for filtering candidate structures from the generative model. Additional targets are iteratively produced and used in the course of stochastic EM iterations to maximize the log-likelihood that the candidate structures are accepted. A simple rejection (re-weighting) sampler suffices to draw posterior samples since the generative model is already reasonable after pre-training. We demonstrate significant gains over strong baselines for both unconditional and conditional molecular design. In particular, our approach outperforms the previous state-of-the-art in conditional molecular design by over 10% in absolute gain.

augmentation, molecular design, molecule, (14 more...)

arXiv.org Machine Learning

2002.0472

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning

Yu, Weihao, Jiang, Zihang, Dong, Yanfei, Feng, Jiashi

arXiv.org Artificial IntelligenceFeb-11-2020

Recent powerful pre-trained language models have achieved remarkable performance on most of the popular datasets for reading comprehension. It is time to introduce more challenging datasets to push the development of this field towards more comprehensive reasoning of text. In this paper, we introduce a new Reading Comprehension dataset requiring logical reasoning (ReClor) extracted from standardized graduate admission examinations. As earlier studies suggest, human-annotated datasets usually contain biases, which are often exploited by models to achieve high accuracy without truly understanding the text. In order to comprehensively evaluate the logical reasoning ability of models on ReClor, we propose to identify biased data points and separate them into EASY set while the rest as HARD set. Empirical results show that state-of-the-art models have an outstanding ability to capture biases contained in the dataset with high accuracy on EASY set. However, they struggle on HARD set with poor performance near that of random guess, indicating more research is needed to essentially enhance the logical reasoning ability of current models. 1

conference paper, dataset, reasoning, (11 more...)

arXiv.org Artificial Intelligence

2002.04326

Country:

Atlantic Ocean > North Atlantic Ocean > Bay of Fundy (0.04)
Asia > Singapore (0.04)
Asia > Japan (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Education > Assessment & Standards > Student Performance (1.00)
Education > Educational Setting > Higher Education (0.94)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)

Add feedback