AITopics | Machine Translation

Collaborating Authors

Machine Translation

"Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful in a number of specific domains."
– Definition from the European Association for Machine Translation (EAMT).

You can translate text of your choice by using free translators such as: CAPITA, Google Translate, SDL International, SYSTRAN.

News Overviews Instructional Materials AI-Alerts Classics

The Widely-Spoken Languages We Still Can't Translate Online

WIREDNov-28-2018, 21:09:03 GMT

In the internet age, when we face a language barrier, there are a host of internet resources to solve it: things like translation apps, dictionary websites, versions of Wikipedia in other languages, and the simple "click to translate" option. But there are about 7000 languages spoken in the world today. The top 10 or so are spoken by hundred of millions of speakers; the bottom third have 1000 speakers or fewer. Gretchen McCulloch is WIRED's resident linguist. Her book Because Internet: Understanding the New Rules of Language is due out in July 2019 from Penguin.

artificial intelligence, natural language, social media, (17 more...)

WIRED

Country: North America > United States (0.33)

Industry:

Government > Regional Government > North America Government > United States Government (0.33)
Government > Military (0.33)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Stochastic Gradient Push for Distributed Deep Learning

Assran, Mahmoud, Loizou, Nicolas, Ballas, Nicolas, Rabbat, Michael

arXiv.org Artificial IntelligenceNov-26-2018

Large mini-batch parallel SGD is commonly used for distributed training of deep networks. Approaches that use tightly-coupled exact distributed averaging based on AllReduce are sensitive to slow nodes and high-latency communication. In this work we show the applicability of Stochastic Gradient Push (SGP) for distributed training. SGP uses a gossip algorithm called PushSum for approximate distributed averaging, allowing for much more loosely coupled communications, which can be beneficial in high-latency or high-variability scenarios. The tradeoff is that approximate distributed averaging injects additional noise in the gradient which can affect the train and test accuracies. We prove that SGP converges to a stationary point of smooth, non-convex objective functions. Furthermore, we validate empirically the potential of SGP. For example, using 32 nodes with 8 GPUs per node to train ResNet-50 on ImageNet, where nodes communicate over 10Gbps Ethernet, SGP completes 90 epochs in around 1.6 hours while AllReduce SGD takes over 5 hours, and the top-1 validation accuracy of SGP remains within 1.2% of that obtained using AllReduce SGD.

machine learning, natural language, node, (20 more...)

arXiv.org Artificial Intelligence

1811.10792

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.72)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.64)

Add feedback

Sentence Encoding with Tree-constrained Relation Networks

Yu, Lei, d'Autume, Cyprien de Masson, Dyer, Chris, Blunsom, Phil, Kong, Lingpeng, Ling, Wang

arXiv.org Artificial IntelligenceNov-26-2018

The meaning of a sentence is a function of the relations that hold between its words. We instantiate this relational view of semantics in a series of neural models based on variants of relation networks (RNs) which represent a set of objects (for us, words forming a sentence) in terms of representations of pairs of objects. We propose two extensions to the basic RN model for natural language. First, building on the intuition that not all word pairs are equally informative about the meaning of a sentence, we use constraints based on both supervised and unsupervised dependency syntax to control which relations influence the representation. Second, since higher-order relations are poorly captured by a sum of pairwise relations, we use a recurrent extension of RNs to propagate information so as to form representations of higher order relations. Experiments on sentence classification, sentence pair classification, and machine translation reveal that, while basic RNs are only modestly effective for sentence representation, recurrent RNs with latent syntax are a reliably powerful representational device.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

1811.10475

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
(2 more...)

Add feedback

Connecting the Dots Between MLE and RL for Sequence Generation

Tan, Bowen, Hu, Zhiting, Yang, Zichao, Salakhutdinov, Ruslan, Xing, Eric

arXiv.org Artificial IntelligenceNov-23-2018

Sequence generation models such as recurrent networks can be trained with a diverse set of learning algorithms. For example, maximum likelihood learning is simple and efficient, yet suffers from the exposure bias problem. Reinforcement learning like policy gradient addresses the problem but can have prohibitively poor exploration efficiency. A variety of other algorithms such as RAML, SPG, and data noising, have also been developed from different perspectives. This paper establishes a formal connection between these algorithms. We present a generalized entropy regularized policy optimization formulation, and show that the apparently divergent algorithms can all be reformulated as special instances of the framework, with the only difference being the configurations of reward function and a couple of hyperparameters. The unified interpretation offers a systematic view of the varying properties of exploration and learning efficiency. Besides, based on the framework, we present a new algorithm that dynamically interpolates among the existing algorithms for improved learning. Experiments on machine translation and text summarization demonstrate the superiority of the proposed algorithm.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

1811.0974

Country:

North America > United States (0.28)
Asia (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)

Add feedback

Learning from Multiview Correlations in Open-Domain Videos

Holzenberger, Nils, Palaskar, Shruti, Madhyastha, Pranava, Metze, Florian, Arora, Raman

arXiv.org Artificial IntelligenceNov-21-2018

An increasing number of datasets contain multiple views, such as video, sound and automatic captions. A basic challenge in representation learning is how to leverage multiple views to learn better representations. This is further complicated by the existence of a latent alignment between views, such as between speech and its transcription, and by the multitude of choices for the learning objective. We explore an advanced, correlation-based representation learning method on a 4-way parallel, multimodal dataset, and assess the quality of the learned representations on retrieval-based tasks. We show that the proposed approach produces rich representations that capture most of the information shared across views. Our best models for speech and textual modalities achieve retrieval rates from 70.7% to 96.9% on open-domain, user-generated instructional videos. This shows it is possible to learn reliable representations across disparate, unaligned and noisy modalities, and encourages using the proposed approach on larger datasets.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

1811.0889

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report (0.64)

Industry: Education > Educational Technology (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Neural Machine Translation with Adequacy-Oriented Learning

Kong, Xiang, Tu, Zhaopeng, Shi, Shuming, Hovy, Eduard, Zhang, Tong

arXiv.org Artificial IntelligenceNov-20-2018

Although Neural Machine Translation (NMT) models have advanced state-of-the-art performance in machine translation, they face problems like the inadequate translation. We attribute this to that the standard Maximum Likelihood Estimation (MLE) cannot judge the real translation quality due to its several limitations. In this work, we propose an adequacy-oriented learning mechanism for NMT by casting translation as a stochastic policy in Reinforcement Learning (RL), where the reward is estimated by explicitly measuring translation adequacy. Benefiting from the sequence-level training of RL strategy and a more accurate reward designed specifically for translation, our model outperforms multiple strong baselines, including (1) standard and coverage-augmented attention models with MLE-based training, and (2) advanced reinforcement and adversarial training strategies with rewards based on both word-level BLEU and character-level chrF3. Quantitative and qualitative analyses on different language pairs and NMT architectures demonstrate the effectiveness and universality of the proposed approach.

machine learning, natural language, translation, (16 more...)

arXiv.org Artificial Intelligence

1811.08541

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)

Add feedback

WEST: Word Encoded Sequence Transducers

Variani, Ehsan, Suresh, Ananda Theertha, Weintraub, Mitchel

arXiv.org Machine LearningNov-20-2018

Most of the parameters in large vocabulary models are used in embedding layer to map categorical features to vectors and in softmax layer for classification weights. This is a bottle-neck in memory constraint on-device training applications like federated learning and on-device inference applications like automatic speech recognition (ASR). One way of compressing the embedding and softmax layers is to substitute larger units such as words with smaller sub-units such as characters. However, often the sub-unit models perform poorly compared to the larger unit models. We propose WEST, an algorithm for encoding categorical features and output classes with a sequence of random or domain dependent sub-units and demonstrate that this transduction can lead to significant compression without compromising performance. WEST bridges the gap between larger unit and sub-unit models and can be interpreted as a MaxEnt model over sub-unit features, which can be of independent interest.

arxiv preprint arxiv, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

1811.08417

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.91)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)

Add feedback

Towards Neural Machine Translation for African Languages

Abbott, Jade Z., Martinus, Laura

arXiv.org Machine LearningNov-13-2018

Given that South African education is in crisis, strategies for improvement and sustainability of high-quality, up-to-date education must be explored. In the migration of education online, inclusion of machine translation for low-resourced local languages becomes necessary. This paper aims to spur the use of current neural machine translation (NMT) techniques for low-resourced local languages. The paper demonstrates state-of-the-art performance on English-to-Setswana translation using the Autshumato dataset. The use of the Transformer architecture beat previous techniques by 5.33 BLEU points. This demonstrates the promise of using current NMT techniques for African languages.

machine learning, natural language, translation, (17 more...)

arXiv.org Machine Learning

1811.05467

Country: Africa > South Africa > Gauteng (0.14)

Genre: Research Report (0.64)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)

Add feedback

ReDecode Framework for Iterative Improvement in Paraphrase Generation

Aggarwal, Milan, Kumari, Nupur, Bansal, Ayush, Krishnamurthy, Balaji

arXiv.org Artificial IntelligenceNov-11-2018

Generating paraphrases, that is, different variations of a sentence conveying the same meaning, is an important yet challenging task in NLP. Automatically generating paraphrases has its utility in many NLP tasks like question answering, information retrieval, conversational systems to name a few. In this paper, we introduce iterative refinement of generated paraphrases within VAE based generation framework. Current sequence generation models lack the capability to (1) make improvements once the sentence is generated; (2) rectify errors made while decoding. We propose a technique to iteratively refine the output using multiple decoders, each one attending on the output sentence generated by the previous decoder. We improve current state of the art results significantly - with over 9% and 28% absolute increase in METEOR scores on Quora question pairs and MSCOCO datasets respectively. We also show qualitatively through examples that our re-decoding approach generates better paraphrases compared to a single decoder by rectifying errors and making improvements in paraphrase structure, inducing variations and introducing new but semantically coherent information.

decoder, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

1811.04454

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Neural-based Pinyin-to-Character Conversion with Adaptive Vocabulary

Huang, Yafang, Zhang, Zhuosheng, Zhao, Hai

arXiv.org Artificial IntelligenceNov-11-2018

Pinyin-to-character (P2C) conversion is the core component of pinyin-based Chinese input method engine (IME). However, the conversion is seriously compromised by the ambiguities of Chinese characters corresponding to pinyin as well as the predefined fixed vocabularies. To alleviate such inconveniences, we propose a neural P2C conversion model augmented by a large online updating vocabulary with a target vocabulary sampling mechanism. Our experiments show that the proposed approach reduces the decoding time on CPUs up to 50$\%$ on P2C tasks at the same or only negligible change in conversion accuracy, and the online updated vocabulary indeed helps our IME effectively follows user inputting behavior.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

1811.04352

Country: Asia > China (0.29)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback