Goto

Collaborating Authors

 Machine Translation


HABERTOR: An Efficient and Effective Deep Hatespeech Detector

arXiv.org Artificial Intelligence

We present our HABERTOR model for detecting hatespeech in large scale user-generated content. Inspired by the recent success of the BERT model, we propose several modifications to BERT to enhance the performance on the downstream hatespeech classification task. HABERTOR inherits BERT's architecture, but is different in four aspects: (i) it generates its own vocabularies and is pre-trained from the scratch using the largest scale hatespeech dataset; (ii) it consists of Quaternion-based factorized components, resulting in a much smaller number of parameters, faster training and inferencing, as well as less memory usage; (iii) it uses our proposed multi-source ensemble heads with a pooling layer for separate input sources, to further enhance its effectiveness; and (iv) it uses a regularized adversarial training with our proposed fine-grained and adaptive noise magnitude to enhance its robustness. Through experiments on the large-scale real-world hatespeech dataset with 1.4M annotated comments, we show that HABERTOR works better than 15 state-of-the-art hatespeech detection methods, including fine-tuning Language Models. In particular, comparing with BERT, our HABERTOR is 4~5 times faster in the training/inferencing phase, uses less than 1/3 of the memory, and has better performance, even though we pre-train it by using less than 1% of the number of words. Our generalizability analysis shows that HABERTOR transfers well to other unseen hatespeech datasets and is a more efficient and effective alternative to BERT for the hatespeech classification.


A Corpus for English-Japanese Multimodal Neural Machine Translation with Comparable Sentences

arXiv.org Artificial Intelligence

Multimodal neural machine translation (NMT) has become an increasingly important area of research over the years because additional modalities, such as image data, can provide more context to textual data. Furthermore, the viability of training multimodal NMT models without a large parallel corpus continues to be investigated due to low availability of parallel sentences with images, particularly for English-Japanese data. However, this void can be filled with comparable sentences that contain bilingual terms and parallel phrases, which are naturally created through media such as social network posts and e-commerce product descriptions. In this paper, we propose a new multimodal English-Japanese corpus with comparable sentences that are compiled from existing image captioning datasets. In addition, we supplement our comparable sentences with a smaller parallel corpus for validation and test purposes. To test the performance of this comparable sentence translation scenario, we train several baseline NMT models with our comparable corpus and evaluate their English-Japanese translation performance. Due to low translation scores in our baseline experiments, we believe that current multimodal NMT models are not designed to effectively utilize comparable sentence data. Despite this, we hope for our corpus to be used to further research into multimodal NMT with comparable sentences.


AI localization tool claims to translate your words in your voice

Engadget

Localization is a tricky issue for all content creators. It can take significant time and resources to make their work fully accessible to folks who speak different languages. One company thinks it has cracked part of that code with an artificial intelligence system that automatically translates speech into other languages in the same speaker's voice. Resemble AI says its Localize tool can keep voices consistent in various languages in movies, games, audiobooks, corporate videos and other formats. Google is working on similar tech, but we haven't heard much about that since it published a paper on the Translatotron system last year.


DiDi's Machine Translation System for WMT2020

arXiv.org Artificial Intelligence

This paper describes DiDi AI Labs' submission to the WMT2020 news translation shared task. We participate in the translation direction of Chinese->English. In this direction, we use the Transformer as our baseline model, and integrate several techniques for model enhancement, including data filtering, data selection, back-translation, fine-tuning, model ensembling, and re-ranking. As a result, our submission achieves a BLEU score of $36.6$ in Chinese->English.


Generating Diverse Translation from Model Distribution with Dropout

arXiv.org Artificial Intelligence

Despite the improvement of translation quality, neural machine translation (NMT) often suffers from the lack of diversity in its generation. In this paper, we propose to generate diverse translations by deriving a large number of possible models with Bayesian modelling and sampling models from them for inference. The possible models are obtained by applying concrete dropout to the NMT model and each of them has specific confidence for its prediction, which corresponds to a posterior model distribution under specific training data in the principle of Bayesian modeling. With variational inference, the posterior model distribution can be approximated with a variational distribution, from which the final models for inference are sampled. We conducted experiments on Chinese-English and English-German translation tasks and the results shows that our method makes a better trade-off between diversity and accuracy.


Amazon Translate ranked as #1 machine translation provider by Intento

#artificialintelligence

Customer obsession, one of the key Amazon Leadership principles that guides everything we do at Amazon, has helped Amazon Translate be recognized as an industry leading neural machine translation provider. This year, Intento ranked Amazon Translate #1 on the list of top-performing machine translation providers in its The State of Machine Translation 2020 report. We are excited to be recognized for pursuing our passion--designing the best customer experience in machine translation. Amazon Translate is a neural machine translation service that delivers fast, high-quality, and affordable language translation. Neural machine translation is a form of machine translation that uses deep learning models to deliver more accurate and more natural sounding translation than traditional statistical and rule-based translation algorithms.


Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding

arXiv.org Artificial Intelligence

RxR is multilingual (English, Hindi, and Telugu) and larger (more paths and instructions) than other VLN datasets. It emphasizes the role of language in VLN by addressing known biases in paths and eliciting more references to visible entities. Furthermore, each word in an instruction is time-aligned to the virtual poses of instruction creators and validators. We establish baseline scores for monolingual and multilingual settings and multitask learning when including Room-to-Room annotations (Anderson et al., 2018b). We also provide results for a model that learns from synchronized pose traces by focusing only on portions of the panorama attended to in human Figure 1: RxR's instructions are densely grounded to demonstrations. The size, scope and detail of the visual scene by aligning the annotator's virtual pose RxR dramatically expands the frontier for research to their spoken instructions for navigating a path.


Explicit Alignment Objectives for Multilingual Bidirectional Encoders

arXiv.org Artificial Intelligence

Pre-trained cross-lingual encoders such as mBERT (Devlin et al., 2019) and XLMR (Conneau et al., 2020) have proven to be impressively effective at enabling transfer-learning of NLP systems from high-resource languages to low-resource languages. This success comes despite the fact that there is no explicit objective to align the contextual embeddings of words/sentences with similar meanings across languages together in the same space. In this paper, we present a new method for learning multilingual encoders, AMBER (Aligned Multilingual Bidirectional EncodeR). AMBER is trained on additional parallel data using two explicit alignment objectives that align the multilingual representations at different granularities. We conduct experiments on zero-shot cross-lingual transfer learning for different tasks including sequence tagging, sentence retrieval and sentence classification. Experimental results show that AMBER obtains gains of up to 1.1 average F1 score on sequence tagging and up to 27.3 average accuracy on retrieval over the XLMR-large model which has 4.6x the parameters of AMBER.


Multilingual Argument Mining: Datasets and Analysis

arXiv.org Artificial Intelligence

The growing interest in argument mining and computational argumentation brings with it a plethora of Natural Language Understanding (NLU) tasks and corresponding datasets. However, as with many other NLU tasks, the dominant language is English, with resources in other languages being few and far between. In this work, we explore the potential of transfer learning using the multilingual BERT model to address argument mining tasks in non-English languages, based on English datasets and the use of machine translation. We show that such methods are well suited for classifying the stance of arguments and detecting evidence, but less so for assessing the quality of arguments, presumably because quality is harder to preserve under translation. In addition, focusing on the translate-train approach, we show how the choice of languages for translation, and the relations among them, affect the accuracy of the resultant model. Finally, to facilitate evaluation of transfer learning on argument mining tasks, we provide a human-generated dataset with more than 10k arguments in multiple languages, as well as machine translation of the English datasets.


With Little Power Comes Great Responsibility

arXiv.org Artificial Intelligence

Despite its importance to experimental design, statistical power (the probability that, given a real effect, an experiment will reject the null hypothesis) has largely been ignored by the NLP community. Underpowered experiments make it more difficult to discern the difference between statistical noise and meaningful model improvements, and increase the chances of exaggerated findings. By meta-analyzing a set of existing NLP papers and datasets, we characterize typical power for a variety of settings and conclude that underpowered experiments are common in the NLP literature. In particular, for several tasks in the popular GLUE benchmark, small test sets mean that most attempted comparisons to state of the art models will not be adequately powered. Similarly, based on reasonable assumptions, we find that the most typical experimental design for human rating studies will be underpowered to detect small model differences, of the sort that are frequently studied. For machine translation, we find that typical test sets of 2000 sentences have approximately 75% power to detect differences of 1 BLEU point. To improve the situation going forward, we give an overview of best practices for power analysis in NLP and release a series of notebooks to assist with future power analyses.