AITopics | Machine Translation

Collaborating Authors

Machine Translation

"Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful in a number of specific domains."
– Definition from the European Association for Machine Translation (EAMT).

You can translate text of your choice by using free translators such as: CAPITA, Google Translate, SDL International, SYSTRAN.

News Overviews Instructional Materials AI-Alerts Classics

Extremely Low Bit Transformer Quantization for On-Device Neural Machine Translation

Chung, Insoo, Kim, Byeongwook, Choi, Yoonjung, Kwon, Se Jung, Jeon, Yongkweon, Park, Baeseong, Kim, Sangha, Lee, Dongsoo

arXiv.org Machine LearningOct-13-2020

The deployment of widely used Transformer architecture is challenging because of heavy computation load and memory overhead during inference, especially when the target device is limited in computational resources such as mobile or edge devices. Quantization is an effective technique to address such challenges. Our analysis shows that for a given number of quantization bits, each block of Transformer contributes to translation quality and inference computations in different manners. Moreover, even inside an embedding block, each word presents vastly different contributions. Correspondingly, we propose a mixed precision quantization strategy to represent Transformer weights by an extremely low number of bits (e.g., under 3 bits). For example, for each word in an embedding block, we assign different quantization bits based on statistical property. Our quantized Transformer model achieves 11.8$\times$ smaller model size than the baseline model, with less than -0.5 BLEU. We achieve 8.3$\times$ reduction in run-time memory footprints and 3.5$\times$ speed up (Galaxy N10+) such that our proposed compression strategy enables efficient implementation for on-device NMT.

quantization, quantization bit, transformer, (16 more...)

arXiv.org Machine Learning

2009.07453

Country:

Asia > Middle East > Iran (0.05)
Asia > Pakistan (0.04)
Asia > Middle East > Israel (0.04)
(17 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Self-Paced Learning for Neural Machine Translation

Wan, Yu, Yang, Baosong, Wong, Derek F., Zhou, Yikai, Chao, Lidia S., Zhang, Haibo, Chen, Boxing

arXiv.org Artificial IntelligenceOct-13-2020

Recent studies have proven that the training of neural machine translation (NMT) can be facilitated by mimicking the learning process of humans. Nevertheless, achievements of such kind of curriculum learning rely on the quality of artificial schedule drawn up with the handcrafted features, e.g. sentence length or word rarity. We ameliorate this procedure with a more flexible manner by proposing self-paced learning, where NMT model is allowed to 1) automatically quantify the learning confidence over training examples; and 2) flexibly govern its learning via regulating the loss in each iteration step. Experimental results over multiple translation tasks demonstrate that the proposed model yields better performance than strong baselines and those models trained with human-designed curricula on both translation quality and convergence speed.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2020.emnlp-main.80

2010.04505

Country:

Asia > Macao (0.05)
Asia > China (0.05)
Europe > Czechia > Prague (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Dissecting Lottery Ticket Transformers: Structural and Behavioral Study of Sparse Neural Machine Translation

Movva, Rajiv, Zhao, Jason Y.

arXiv.org Machine LearningOct-12-2020

Recent work on the lottery ticket hypothesis has produced highly sparse Transformers for NMT while maintaining BLEU. However, it is unclear how such pruning techniques affect a model's learned representations. By probing Transformers with more and more low-magnitude weights pruned away, we find that complex semantic information is first to be degraded. Analysis of internal activations reveals that higher layers diverge most over the course of pruning, gradually becoming less complex than their dense counterparts. Meanwhile, early layers of sparse models begin to perform more encoding. Attention mechanisms remain remarkably consistent as sparsity increases.

artificial intelligence, natural language, representation, (18 more...)

arXiv.org Machine Learning

2009.1327

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Gambling (0.61)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Evaluation of Siamese Networks for Semantic Code Search

Sinha, Raunak, Desai, Utkarsh, Tamilselvam, Srikanth, Mani, Senthil

arXiv.org Artificial IntelligenceOct-12-2020

With the increase in the number of open repositories and discussion forums, the use of natural language for semantic code search has become increasingly common. The accuracy of the results returned by such systems, however, can be low due to 1) limited shared vocabulary between code and user query and 2) inadequate semantic understanding of user query and its relation to code syntax. Siamese networks are well suited to learning such joint relations between data, but have not been explored in the context of code search. In this work, we evaluate Siamese networks for this task by exploring multiple extraction network architectures. These networks independently process code and text descriptions before passing them to a Siamese network to learn embeddings in a common space. We experiment on two different datasets and discover that Siamese networks can act as strong regularizers on networks that extract rich information from code and text, which in turn helps achieve impressive performance on code search beating previous baselines on $2$ programming languages. We also analyze the embedding space of these networks and provide directions to fully leverage the power of Siamese networks for semantic code search.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2011.01043

Country: Europe > Germany > Berlin (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.93)
(2 more...)

Add feedback

Look It Up: Bilingual and Monolingual Dictionaries Improve Neural Machine Translation

Zhong, Xing Jie, Chiang, David

arXiv.org Artificial IntelligenceOct-12-2020

Despite advances in neural machine translation (NMT) quality, rare words continue to be problematic. For humans, the solution to the rare-word problem has long been dictionaries, but dictionaries cannot be straightforwardly incorporated into NMT. In this paper, we describe a new method for "attaching" dictionary definitions to rare words so that the network can learn the best way to use them. We demonstrate improvements of up to 3.1 BLEU using bilingual dictionaries and up to 0.7 BLEU using monolingual source-language dictionaries.

artificial intelligence, natural language, translation, (19 more...)

arXiv.org Artificial Intelligence

2010.05997

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation

Li, Dongxu, Xu, Chenchen, Yu, Xin, Zhang, Kaihao, Swift, Ben, Suominen, Hanna, Li, Hongdong

arXiv.org Artificial IntelligenceOct-12-2020

Sign language translation (SLT) aims to interpret sign video sequences into textbased natural language sentences. Sign videos consist of continuous sequences of sign gestures with no clear boundaries in between. Existing SLT models usually represent sign visual features in a frame-wise manner so as to avoid needing to explicitly segmenting the videos into isolated signs. However, these methods neglect the temporal information of signs and lead to substantial ambiguity in translation. In this paper, we explore the temporal semantic structures of sign videos to learn more discriminative features. To this end, we first present a novel sign video segment representation which takes into account multiple temporal granularities, thus alleviating the need for accurate video segmentation. Taking advantage of the proposed segment representation, we develop a novel hierarchical sign video feature learning method via a temporal semantic pyramid network, called TSPNet. Specifically, TSPNet introduces an inter-scale attention to evaluate and enhance local semantic consistency of sign segments and an intra-scale attention to resolve semantic ambiguity by using non-local video context. Experiments show that our TSPNet outperforms the state-of-the-art with significant improvements on the BLEU score (from 9.58 to 13.41) and ROUGE score (from 31.80 to 34.96) on the largest commonly-used SLT dataset.

machine learning, natural language, translation, (18 more...)

arXiv.org Artificial Intelligence

2010.05468

Country:

South America > Paraguay > Asunción > Asunción (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Finland > Southwest Finland > Turku (0.04)
Asia > Japan > Kyūshū & Okinawa > Kyūshū > Miyazaki Prefecture > Miyazaki (0.04)

Genre: Research Report (0.50)

Industry:

Health & Medicine (1.00)
Education > Curriculum > Subject-Specific Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

It's not a Non-Issue: Negation as a Source of Error in Machine Translation

Hossain, Md Mosharaf, Anastasopoulos, Antonios, Blanco, Eduardo, Palmer, Alexis

arXiv.org Artificial IntelligenceOct-11-2020

As machine translation (MT) systems progress at a rapid pace, questions of their adequacy linger. In this study we focus on negation, a universal, core property of human language that significantly affects the semantics of an utterance. We investigate whether translating negation is an issue for modern MT systems using 17 translation directions as test bed. Through thorough analysis, we find that indeed the presence of negation can significantly impact downstream quality, in some cases resulting in quality reductions of more than 60%. We also provide a linguistically motivated analysis that directly explains the majority of our findings. We release our annotations and code to replicate our analysis here: https://github.com/mosharafhossain/negation-mt.

machine learning, natural language, negation, (15 more...)

arXiv.org Artificial Intelligence

2010.05432

Country:

North America > United States > Texas (0.14)
Europe > Germany > Saxony > Leipzig (0.05)
Europe > Italy > Tuscany > Florence (0.04)
(10 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Lexically Cohesive Neural Machine Translation with Copy Mechanism

Mishra, Vipul, Chu, Chenhui, Arase, Yuki

arXiv.org Artificial IntelligenceOct-11-2020

Lexically cohesive translations preserve consistency in word choices in document-level translation. We employ a copy mechanism into a context-aware neural machine translation model to allow copying words from previous translation outputs. Different from previous context-aware neural machine translation models that handle all the discourse phenomena implicitly, our model explicitly addresses the lexical cohesion problem by boosting the probabilities to output words consistently. We conduct experiments on Japanese to English translation using an evaluation dataset for discourse translation. The results showed that the proposed model significantly improved lexical cohesion compared to previous context-aware models.

artificial intelligence, natural language, translation, (17 more...)

arXiv.org Artificial Intelligence

2010.05193

Country:

Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)

Genre: Research Report > New Finding (0.87)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

On the Computational Power of Transformers and its Implications in Sequence Modeling

Bhattamishra, Satwik, Patel, Arkil, Goyal, Navin

arXiv.org Machine LearningOct-10-2020

Transformers are being used extensively across several sequence modeling tasks. Significant research effort has been devoted to experimentally probe the inner workings of Transformers. However, our conceptual and theoretical understanding of their power and inherent limitations is still nascent. In particular, the roles of various components in Transformers such as positional encodings, attention heads, residual connections, and feedforward networks, are not clear. In this paper, we take a step towards answering these questions. We analyze the computational power as captured by Turing-completeness. We first provide an alternate and simpler proof to show that vanilla Transformers are Turing-complete and then we prove that Transformers with only positional masking and without any positional encoding are also Turing-complete. We further analyze the necessity of each component for the Turing-completeness of the network; interestingly, we find that a particular type of residual connection is necessary. We demonstrate the practical implications of our results via experiments on machine translation and synthetic tasks.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2006.09286

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(5 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Cue-word Driven Neural Response Generation with a Shrinking Vocabulary

Wang, Qiansheng, Liu, Yuxin, Lv, Chengguo, Wang, Zhen, Fu, Guohong

arXiv.org Artificial IntelligenceOct-10-2020

Open-domain response generation is the task of generating sensible and informative re-sponses to the source sentence. However, neural models tend to generate safe and mean-ingless responses. While cue-word introducing approaches encourage responses with concrete semantics and have shown tremendous potential, they still fail to explore di-verse responses during decoding. In this paper, we propose a novel but natural approach that can produce multiple cue-words during decoding, and then uses the produced cue-words to drive decoding and shrinks the decoding vocabulary. Thus the neural genera-tion model can explore the full space of responses and discover informative ones with efficiency. Experimental results show that our approach significantly outperforms several strong baseline models with much lower decoding complexity. Especially, our approach can converge to concrete semantics more efficiently during decoding.

arxiv preprint arxiv, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2010.04927

Country: North America > United States > New York (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.41)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback