Goto

Collaborating Authors

 Machine Translation



New AI Model Translates 200 Languages, Making Technology Accessible to More People -- I-COM

#artificialintelligence

Language is our lifeline to the world. But because high-quality translation tools don't exist for hundreds of languages, billions of people today can't access digital content or participate fully in conversations and communities online in their preferred or native languages. This is particularly an issue for hundreds of millions of people who speak the many languages of Africa and Asia. To help people connect better today and be part of the metaverse of tomorrow, our AI researchers created No Language Left Behind (NLLB), an effort to develop high-quality machine translation capabilities for most of the world's languages. Today, we're announcing an important breakthrough in NLLB: We've built a single AI model called NLLB-200, which translates 200 different languages with results far more accurate than what previous technology could accomplish.


Bilingual Terminology Extraction from Comparable E-Commerce Corpora

arXiv.org Artificial Intelligence

Bilingual terminologies are important machine translation resources in the field of e-commerce, which are usually either manually translated or automatically extracted from parallel data. The human translation is costly and e-commerce parallel corpora is very scarce. However, the comparable data in different languages in the same commodity field is abundant. In this paper, we propose a novel framework of extracting e-commercial bilingual terminologies from comparable data. Benefiting from the cross-lingual pre-training in e-commerce, our framework can make full use of the deep semantic relationship between source-side terminology and target-side sentence to extract corresponding target terminology. Experimental results on various language pairs show that our approaches achieve significantly better performance than various strong baselines.


Benchmarking Azerbaijani Neural Machine Translation

arXiv.org Artificial Intelligence

Little research has been done on Neural Machine Translation (NMT) for Azerbaijani. In this paper, we benchmark the performance of Azerbaijani-English NMT systems on a range of techniques and datasets. We evaluate which segmentation techniques work best on Azerbaijani translation and benchmark the performance of Azerbaijani NMT models across several domains of text. Our results show that while Unigram segmentation improves NMT performance and Azerbaijani translation models scale better with dataset quality than quantity, cross-domain generalization remains a challenge


General Cross-Architecture Distillation of Pretrained Language Models into Matrix Embeddings

arXiv.org Artificial Intelligence

Large pretrained language models (PreLMs) are revolutionizing natural language processing across all benchmarks. However, their sheer size is prohibitive for small laboratories or for deployment on mobile devices. Approaches like pruning and distillation reduce the model size but typically retain the same model architecture. In contrast, we explore distilling PreLMs into a different, more efficient architecture, Continual Multiplication of Words (CMOW), which embeds each word as a matrix and uses matrix multiplication to encode sequences. We extend the CMOW architecture and its CMOW/CBOW-Hybrid variant with a bidirectional component for more expressive power, per-token representations for a general (task-agnostic) distillation during pretraining, and a two-sequence encoding scheme that facilitates downstream tasks on sentence pairs, such as sentence similarity and natural language inference. Our matrix-based bidirectional CMOW/CBOW-Hybrid model is competitive to DistilBERT on question similarity and recognizing textual entailment, but uses only half of the number of parameters and is three times faster in terms of inference speed. We match or exceed the scores of ELMo for all tasks of the GLUE benchmark except for the sentiment analysis task SST-2 and the linguistic acceptability task CoLA. However, compared to previous cross-architecture distillation approaches, we demonstrate a doubling of the scores on detecting linguistic acceptability. This shows that matrix-based embeddings can be used to distill large PreLM into competitive models and motivates further research in this direction.


Persona-Knowledge Dialogue Multi-Context Retrieval and Enhanced Decoding Methods

arXiv.org Artificial Intelligence

Persona and Knowledge dual context open-domain chat is a novel dialogue generation task introduced recently. While Persona and Knowledge is each interesting context of open-domain dialogue, the combination of both has not been well studied. We tackle Persona-Knowledge identification and response generation tasks in this paper. We design an informed data augmentation strategy that is compatible with neural Q&A retrieval models. With the augmented data, we perform permutative Persona-Knowledge evaluation and successive Persona search fine-tuning. Furthermore, we perform dialogue generation with various decoding techniques and illustrate crucial elements. We achieve SOTA across official metrics with 93.99% Grounding accuracy average and 23.62 SacreBLEU score.


Real-time Translations with AI - KDnuggets

#artificialintelligence

That's what the doll in Squid Game says. But how would you know! You got subtitles on your plate. Shows like Squid Game and Money Heist topping Netflix charts opened up a whole new genre of drama and entertainment for the audience to explore with different language content. People locked inside the doors during the pandemic brought the world closer together in its unique ways.


Amazon AI Releases PyTorch-Based 'Sockeye 3': The Latest Version of the Sockeye Toolkit for Neural Machine Translation (NMT)

#artificialintelligence

The performance of machine translation systems, which previously relied on phrase-based systems, has suddenly improved with the advent of neural network-based models. An open-source framework called Sockeye was released in 2018. This framework provides quick and dependable PyTorch implementation for neural machine translation (NMT) and other related tasks. It supports Amazon Translate and several other NMT applications. In 2020, Sockeye 2, its improved version, was also launched.


Are Neighbors Enough? Multi-Head Neural n-gram can be Alternative to Self-attention

arXiv.org Artificial Intelligence

Impressive performance of Transformer has been attributed to self-attention, where dependencies between entire input in a sequence are considered at every position. In this work, we reform the neural $n$-gram model, which focuses on only several surrounding representations of each position, with the multi-head mechanism as in Vaswani et al.(2017). Through experiments on sequence-to-sequence tasks, we show that replacing self-attention in Transformer with multi-head neural $n$-gram can achieve comparable or better performance than Transformer. From various analyses on our proposed method, we find that multi-head neural $n$-gram is complementary to self-attention, and their combinations can further improve performance of vanilla Transformer.


Bi-SimCut: A Simple Strategy for Boosting Neural Machine Translation

arXiv.org Artificial Intelligence

We introduce Bi-SimCut: a simple but effective training strategy to boost neural machine translation (NMT) performance. It consists of two procedures: bidirectional pretraining and unidirectional finetuning. Both procedures utilize SimCut, a simple regularization method that forces the consistency between the output distributions of the original and the cutoff sentence pairs. Without leveraging extra dataset via back-translation or integrating large-scale pretrained model, Bi-SimCut achieves strong translation performance across five translation benchmarks (data sizes range from 160K to 20.2M): BLEU scores of 31.16 for en -> de and 38.37 for de -> en on the IWSLT14 dataset, 30.78 for en -> de and 35.15 for de -> en on the WMT14 dataset, and 27.17 for zh -> en on the WMT17 dataset. SimCut is not a new method, but a version of Cutoff (Shen et al., 2020) simplified and adapted for NMT, and it could be considered as a perturbation-based method. Given the universality and simplicity of SimCut and Bi-SimCut, we believe they can serve as strong baselines for future NMT research.