Goto

Collaborating Authors

 Machine Translation


Sentiment-based Candidate Selection for NMT

arXiv.org Artificial Intelligence

The explosion of user-generated content (UGC)--e.g. social media posts, comments, and reviews--has motivated the development of NLP applications tailored to these types of informal texts. Prevalent among these applications have been sentiment analysis and machine translation (MT). Grounded in the observation that UGC features highly idiomatic, sentiment-charged language, we propose a decoder-side approach that incorporates automatic sentiment scoring into the MT candidate selection process. We train separate English and Spanish sentiment classifiers, then, using n-best candidates generated by a baseline MT model with beam search, select the candidate that minimizes the absolute difference between the sentiment score of the source sentence and that of the translation, and perform a human evaluation to assess the produced translations. Unlike previous work, we select this minimally divergent translation by considering the sentiment scores of the source sentence and translation on a continuous interval, rather than using e.g. binary classification, allowing for more fine-grained selection of translation candidates. The results of human evaluations show that, in comparison to the open-source MT baseline model on top of which our sentiment-based pipeline is built, our pipeline produces more accurate translations of colloquial, sentiment-heavy source texts.


XFORMAL: A Benchmark for Multilingual Formality Style Transfer

arXiv.org Artificial Intelligence

We take the first step towards multilingual style transfer by creating and releasing XFORMAL, a benchmark of multiple formal reformulations of informal text in Brazilian Portuguese, French, and Italian. Results on XFORMAL suggest that state-of-the-art style transfer approaches perform close to simple baselines, indicating that style transfer is even more challenging when moving multilingual.


Extended Parallel Corpus for Amharic-English Machine Translation

arXiv.org Artificial Intelligence

This paper describes the acquisition, preprocessing, segmentation, and alignment of an Amharic-English parallel corpus. It will be useful for machine translation of an under-resourced language, Amharic. The corpus is larger than previously compiled corpora; it is released for research purposes. We trained neural machine translation and phrase-based statistical machine translation models using the corpus. In the automatic evaluation, neural machine translation models outperform phrase-based statistical machine translation models.


Dynabench: Rethinking Benchmarking in NLP

arXiv.org Artificial Intelligence

We introduce Dynabench, an open-source platform for dynamic dataset creation and model benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the-loop dataset creation: annotators seek to create examples that a target model will misclassify, but that another person will not. In this paper, we argue that Dynabench addresses a critical need in our community: contemporary models quickly achieve outstanding performance on benchmark tasks but nonetheless fail on simple challenge examples and falter in real-world scenarios. With Dynabench, dataset creation, model development, and model assessment can directly inform each other, leading to more robust and informative benchmarks. We report on four initial NLP tasks, illustrating these concepts and highlighting the promise of the platform, and address potential objections to dynamic benchmarking as a new standard for the field.


Efficient transfer learning for NLP with ELECTRA

arXiv.org Artificial Intelligence

Scope of Reproducibility Clark et al. [2020] claims that the ELECTRA approach is highly efficient in NLP performances relative to computation budget. As such, this study focus on this claim, summarized by the following question: Can we use ELECTRA to achieve close to SOTA performances for NLP in low-resource settings, in term of compute cost? Methodology This replication study has been conducted by fully reimplementing the small variant of the original ELECTRA model (Clark et al. [2020]). All experiments are performed on single GPU computers. GLUE benchmark dev set (Wang et al. [2018]) is used for models evaluation and compared with the original paper.


FixMyPose: Pose Correctional Captioning and Retrieval

arXiv.org Artificial Intelligence

Interest in physical therapy and individual exercises such as yoga/dance has increased alongside the well-being trend. However, such exercises are hard to follow without expert guidance (which is impossible to scale for personalized feedback to every trainee remotely). Thus, automated pose correction systems are required more than ever, and we introduce a new captioning dataset named FixMyPose to address this need. We collect descriptions of correcting a "current" pose to look like a "target" pose (in both English and Hindi). The collected descriptions have interesting linguistic properties such as egocentric relations to environment objects, analogous references, etc., requiring an understanding of spatial relations and commonsense knowledge about postures. Further, to avoid ML biases, we maintain a balance across characters with diverse demographics, who perform a variety of movements in several interior environments (e.g., homes, offices). From our dataset, we introduce the pose-correctional-captioning task and its reverse target-pose-retrieval task. During the correctional-captioning task, models must generate descriptions of how to move from the current to target pose image, whereas in the retrieval task, models should select the correct target pose given the initial pose and correctional description. We present strong cross-attention baseline models (uni/multimodal, RL, multilingual) and also show that our baselines are competitive with other models when evaluated on other image-difference datasets. We also propose new task-specific metrics (object-match, body-part-match, direction-match) and conduct human evaluation for more reliable evaluation, and we demonstrate a large human-model performance gap suggesting room for promising future work. To verify the sim-to-real transfer of our FixMyPose dataset, we collect a set of real images and show promising performance on these images.


The Quest for Human Parity Machine Translation

#artificialintelligence

Recently some in the Singularity community have admitted that "language is hard" as you can see in this attempt to explain why AI has not mastered translation yet. Michael Housman, a faculty member of Singularity University, explained that the ideal scenario for machine learning and artificial intelligence is something with fixed rules and a clear-cut measure of success or failure. He named chess as an obvious example and noted machines were able to beat the best human Go player. This happened faster than anyone anticipated because of the game's very clear rules and limited set of moves. Housman elaborated, "Language is almost the opposite of that. There aren't as clearly-cut and defined rules. The conversation can go in an infinite number of different directions. And then of course, you need labeled data. You need to tell the machine to do it right or wrong."


Attention Forcing for Machine Translation

arXiv.org Artificial Intelligence

Auto-regressive sequence-to-sequence models with attention mechanisms have achieved state-of-the-art performance in various tasks including Text-To-Speech (TTS) and Neural Machine Translation (NMT). The standard training approach, teacher forcing, guides a model with the reference output history. At inference stage, the generated output history must be used. This mismatch can impact performance. However, it is highly challenging to train the model using the generated output. Several approaches have been proposed to address this problem, normally by selectively using the generated output history. To make training stable, these approaches often require a heuristic schedule or an auxiliary classifier. This paper introduces attention forcing for NMT. This approach guides the model with the generated output history and reference attention, and can reduce the training-inference mismatch without a schedule or a classifier. Attention forcing has been successful in TTS, but its application to NMT is more challenging, due to the discrete and multi-modal nature of the output space. To tackle this problem, this paper adds a selection scheme to vanilla attention forcing, which automatically selects a suitable training approach for each pair of training data. Experiments show that attention forcing can improve the overall translation quality and the diversity of the translations.


How we taught Google Translate to stop being sexist

#artificialintelligence

Online translation tools have helped us learn new languages, communicate across linguistic borders, and view foreign websites in our native tongue. But the artificial intelligence (AI) behind them is far from perfect, often replicating rather than rejecting the biases that exist within a language or a society. Such tools are especially vulnerable to gender stereotyping because some languages (such as English) don't tend to gender nouns, while others (such as German) do. When translating from English to German, translation tools have to decide which gender to assign English words like "cleaner." Overwhelmingly, the tools conform to the stereotype, opting for the feminine word in German.


Many-to-English Machine Translation Tools, Data, and Pretrained Models

arXiv.org Artificial Intelligence

While there are more than 7000 languages in the world, most translation research efforts have targeted a few high-resource languages. Commercial translation systems support only one hundred languages or fewer, and do not make these models available for transfer to low resource languages. In this work, we present useful tools for machine translation research: MTData, NLCodec, and RTG. We demonstrate their usefulness by creating a multilingual neural machine translation model capable of translating from 500 source languages to English. We make this multilingual model readily downloadable and usable as a service, or as a parent model for transfer-learning to even lower-resource languages.