Goto

Collaborating Authors

 Machine Translation


The Quest for Human Parity Machine Translation

#artificialintelligence

Recently some in the Singularity community have admitted that "language is hard" as you can see in this attempt to explain why AI has not mastered translation yet. Michael Housman, a faculty member of Singularity University, explained that the ideal scenario for machine learning and artificial intelligence is something with fixed rules and a clear-cut measure of success or failure. He named chess as an obvious example and noted machines were able to beat the best human Go player. This happened faster than anyone anticipated because of the game's very clear rules and limited set of moves. Housman elaborated, "Language is almost the opposite of that. There aren't as clearly-cut and defined rules. The conversation can go in an infinite number of different directions. And then of course, you need labeled data. You need to tell the machine to do it right or wrong."


Attention Forcing for Machine Translation

arXiv.org Artificial Intelligence

Auto-regressive sequence-to-sequence models with attention mechanisms have achieved state-of-the-art performance in various tasks including Text-To-Speech (TTS) and Neural Machine Translation (NMT). The standard training approach, teacher forcing, guides a model with the reference output history. At inference stage, the generated output history must be used. This mismatch can impact performance. However, it is highly challenging to train the model using the generated output. Several approaches have been proposed to address this problem, normally by selectively using the generated output history. To make training stable, these approaches often require a heuristic schedule or an auxiliary classifier. This paper introduces attention forcing for NMT. This approach guides the model with the generated output history and reference attention, and can reduce the training-inference mismatch without a schedule or a classifier. Attention forcing has been successful in TTS, but its application to NMT is more challenging, due to the discrete and multi-modal nature of the output space. To tackle this problem, this paper adds a selection scheme to vanilla attention forcing, which automatically selects a suitable training approach for each pair of training data. Experiments show that attention forcing can improve the overall translation quality and the diversity of the translations.


How we taught Google Translate to stop being sexist

#artificialintelligence

Online translation tools have helped us learn new languages, communicate across linguistic borders, and view foreign websites in our native tongue. But the artificial intelligence (AI) behind them is far from perfect, often replicating rather than rejecting the biases that exist within a language or a society. Such tools are especially vulnerable to gender stereotyping because some languages (such as English) don't tend to gender nouns, while others (such as German) do. When translating from English to German, translation tools have to decide which gender to assign English words like "cleaner." Overwhelmingly, the tools conform to the stereotype, opting for the feminine word in German.


Many-to-English Machine Translation Tools, Data, and Pretrained Models

arXiv.org Artificial Intelligence

While there are more than 7000 languages in the world, most translation research efforts have targeted a few high-resource languages. Commercial translation systems support only one hundred languages or fewer, and do not make these models available for transfer to low resource languages. In this work, we present useful tools for machine translation research: MTData, NLCodec, and RTG. We demonstrate their usefulness by creating a multilingual neural machine translation model capable of translating from 500 source languages to English. We make this multilingual model readily downloadable and usable as a service, or as a parent model for transfer-learning to even lower-resource languages.


Towards General Purpose Vision Systems

arXiv.org Artificial Intelligence

A special purpose learning system assumes knowledge of admissible tasks at design time. Adapting such a system to unforeseen tasks requires architecture manipulation such as adding an output head for each new task or dataset. In this work, we propose a task-agnostic vision-language system that accepts an image and a natural language task description and outputs bounding boxes, confidences, and text. The system supports a wide range of vision tasks such as classification, localization, question answering, captioning, and more. We evaluate the system's ability to learn multiple skills simultaneously, to perform tasks with novel skill-concept combinations, and to learn new skills efficiently and without forgetting.


Is Label Smoothing Truly Incompatible with Knowledge Distillation: An Empirical Study

arXiv.org Artificial Intelligence

This work aims to empirically clarify a recently discovered perspective that label smoothing is incompatible with knowledge distillation (Müller et al., 2019). We begin by introducing the motivation behind on how this incompatibility is raised, i.e., label smoothing erases relative information between teacher logits. We provide a novel connection on how label smoothing affects distributions of semantically similar and dissimilar classes. Then we propose a metric to quantitatively measure the degree of erased information in sample's representation. After that, we study its one-sidedness and imperfection of the incompatibility view through massive analyses, visualizations and comprehensive experiments on Image Classification, Binary Networks, and Neural Machine Translation. Finally, we broadly discuss several circumstances wherein label smoothing will indeed lose its effectiveness. Recently a large body of studies is focusing on exploring the underlying relationships between these two methods, for instance, Müller et al. (Müller et al., 2019) discovered that label smoothing could improve calibration implicitly but will hurt the effectiveness of knowledge distillation. Yuan et al. (Yuan et al., 2019) considered knowledge distillation as a dynamical form of label smoothing as it delivered a regularization effect in training. The recent study (Lukasik et al., 2020) further noticed label smoothing could help mitigate label noise, they showed that when distilling models from noisy data, the teacher with label smoothing is helpful.


English-Twi Parallel Corpus for Machine Translation

arXiv.org Artificial Intelligence

We present a parallel machine translation training corpus for English and Akuapem Twi of 25,421 sentence pairs. We used a transformer-based translator to generate initial translations in Akuapem Twi, which were later verified and corrected where necessary by native speakers to eliminate any occurrence of translationese. In addition, 697 higher quality crowd-sourced sentences are provided for use as an evaluation set for downstream Natural Language Processing (NLP) tasks. The typical use case for the larger human-verified dataset is for further training of machine translation models in Akuapem Twi. The higher quality 697 crowd-sourced dataset is recommended as a testing dataset for machine translation of English to Twi and Twi to English models. Furthermore, the Twi part of the crowd-sourced data may also be used for other tasks, such as representation learning, classification, etc. We fine-tune the transformer translation model on the training corpus and report benchmarks on the crowd-sourced test set.


NLP for Ghanaian Languages

arXiv.org Artificial Intelligence

In the much-applauded interventions by Google The advancement in machine learning computational and Microsoft through their translation services, power coupled with the recent investment quite a number of African languages have been within the domain by technological companies integrated, but Ghanaian languages are excluded has stimulated considerable interest and (Google, 2020; Microsoft, 2021). A historic move brought about a legion of applications in natural worth mentioning is Baidu Translate's incorporation language digitisation in developed countries, of the Twi language in their translation service.


Online translators are sexist – here's how we gave them a little gender sensitivity training

#artificialintelligence

Online translation tools have helped us learn new languages, communicate across linguistic borders, and view foreign websites in our native tongue. But the artificial intelligence (AI) behind them is far from perfect, often replicating rather than rejecting the biases that exist within a language or a society. Such tools are especially vulnerable to gender stereotyping, because some languages (such as English) don't tend to gender nouns, while others (such as German) do. When translating from English to German, translation tools have to decide which gender to assign English words like "cleaner". Overwhelmingly, the tools conform to the stereotype, opting for the feminine word in German.


Zero-Shot Language Transfer vs Iterative Back Translation for Unsupervised Machine Translation

arXiv.org Artificial Intelligence

This work focuses on comparing different solutions for machine translation on low resource language pairs, namely, with zero-shot transfer learning and unsupervised machine translation. We discuss how the data size affects the performance of both unsupervised MT and transfer learning. Additionally we also look at how the domain of the data affects the result of unsupervised MT. The code to all the experiments performed in this project are accessible on Github.