Machine Translation
Ham2Pose: Animating Sign Language Notation into Pose Sequences
Shalev-Arkushin, Rotem, Moryossef, Amit, Fried, Ohad
Translating spoken languages into Sign languages is necessary for open communication between the hearing and hearing-impaired communities. To achieve this goal, we propose the first method for animating a text written in HamNoSys, a lexical Sign language notation, into signed pose sequences. As HamNoSys is universal by design, our proposed method offers a generic solution invariant to the target Sign language. Our method gradually generates pose predictions using transformer encoders that create meaningful representations of the text and poses while considering their spatial and temporal information. We use weak supervision for the training process and show that our method succeeds in learning from partial and inaccurate data. Additionally, we offer a new distance measurement that considers missing keypoints, to measure the distance between pose sequences using DTW-MJE. We validate its correctness using AUTSL, a large-scale Sign language dataset, show that it measures the distance between pose sequences more accurately than existing measurements, and use it to assess the quality of our generated pose sequences. Code for the data pre-processing, the model, and the distance measurement is publicly released for future research.
Exploiting Multilingualism in Low-resource Neural Machine Translation via Adversarial Learning
Kumar, Amit, Pratap, Ajay, Singh, Anil Kumar
Generative Adversarial Networks (GAN) offer a promising approach for Neural Machine Translation (NMT). However, feeding multiple morphologically languages into a single model during training reduces the NMT's performance. In GAN, similar to bilingual models, multilingual NMT only considers one reference translation for each sentence during model training. This single reference translation limits the GAN model from learning sufficient information about the source sentence representation. Thus, in this article, we propose Denoising Adversarial Auto-encoder-based Sentence Interpolation (DAASI) approach to perform sentence interpolation by learning the intermediate latent representation of the source and target sentences of multilingual language pairs. Apart from latent representation, we also use the Wasserstein-GAN approach for the multilingual NMT model by incorporating the model generated sentences of multiple languages for reward computation. This computed reward optimizes the performance of the GAN-based multilingual model in an effective manner. We demonstrate the experiments on low-resource language pairs and find that our approach outperforms the existing state-of-the-art approaches for multilingual NMT with a performance gain of up to 4 BLEU points. Moreover, we use our trained model on zero-shot language pairs under an unsupervised scenario and show the robustness of the proposed approach.
From Text to Meaning: How Natural Language Processing Algorithms Work
Natural language processing (NLP) is a field of study that combines computer science and linguistics to help machines understand human language. NLP has become an integral part of modern technology, powering everything from chatbots to voice assistants. But how exactly do NLP algorithms work? And why do they matter? At its core, NLP is about teaching machines to understand human language.
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision
Beyer, Lucas, Wan, Bo, Madan, Gagan, Pavetic, Filip, Steiner, Andreas, Kolesnikov, Alexander, Pinto, Andrรฉ Susano, Bugliarello, Emanuele, Wang, Xiao, Yu, Qihang, Chen, Liang-Chieh, Zhai, Xiaohua
There has been a recent explosion of computer vision models which perform many tasks and are composed of an image encoder (usually a ViT) and an autoregressive decoder (usually a Transformer). However, most of this work simply presents one system and its results, leaving many questions regarding design decisions and trade-offs of such systems unanswered. In this work, we aim to provide such answers. We take a close look at autoregressive decoders for multi-task learning in multimodal computer vision, including classification, captioning, visual question answering, and optical character recognition. Through extensive systematic experiments, we study the effects of task and data mixture, training and regularization hyperparameters, conditioning type and specificity, modality combination, and more. Importantly, we compare these to well-tuned single-task baselines to highlight the cost incurred by multi-tasking. A key finding is that a small decoder learned on top of a frozen pretrained encoder works surprisingly well. We call this setup locked-image tuning with decoder (LiT-decoder). It can be seen as teaching a decoder to interact with a pretrained vision model via natural language.
XL8 Integrates Zixi, Enhancing Global Reach of Content
Zixi, the industry leader for enabling cost-efficient and highly scalable live broadcast-quality video over any IP network or protocol and provider of the award-winning SDVP, announced a partnership with XL8 that has integrated Zixi into their innovative LiveSubs translation engine to create real-time subtitles powered by its proprietary state-of-the-art AI technology. LiveSubs allows customers to take their Zixi stream and generate live subtitled languages on the fly, from the source language into over 70 global language pairs. Media companies are under increasing pressure to meet the worldwide demand for hyper-localized translated media in the live distribution space and XL8 takes AI-powered machine translation, specially optimized for media content, to the next level. Its advanced technology allows significantly more efficient workflows by providing in-line editing, automated media transcription with time coding, automated subtitling, synthesized voice dubbing, real-time meeting interpretation including a soon-to-be-released Zoom app, and live subtitling. XL8's uniquely specialized translation engines have been built from the ground up utilizing professionally trained, human-perfected subtitles curated from the media industry's top content producers.
Machine Translation with Attention in TensorFlow Python from Scratch
Sequence to Sequence (Seq2Seq) models have been used extensively in various Natural Language Processing (NLP) tasks such as machine translation, text summarization, and question answering. In this blog post, we will implement a Seq2Seq model for Italian-to-English machine translation using TensorFlow and Python OOPs. The model architecture will consist of an Encoder, a Decoder, and an Attention mechanism. The first step in any machine learning task is to preprocess the data. We will be using a dataset of Italian-English sentence pairs for our translation task.
Language-Family Adapters for Low-Resource Multilingual Neural Machine Translation
Chronopoulou, Alexandra, Stojanovski, Dario, Fraser, Alexander
Large multilingual models trained with self-supervision achieve state-of-the-art results in a wide range of natural language processing tasks. Self-supervised pretrained models are often fine-tuned on parallel data from one or multiple language pairs for machine translation. Multilingual fine-tuning improves performance on low-resource languages but requires modifying the entire model and can be prohibitively expensive. Training a new adapter on each language pair or training a single adapter on all language pairs without updating the pretrained model has been proposed as a parameter-efficient alternative. However, the former does not permit any sharing between languages, while the latter shares parameters for all languages and is susceptible to negative interference. In this paper, we propose training language-family adapters on top of mBART-50 to facilitate cross-lingual transfer. Our approach outperforms related baselines, yielding higher translation scores on average when translating from English to 17 different low-resource languages. We also show that language-family adapters provide an effective method to translate to languages unseen during pretraining.
Hallucinations in Large Multilingual Translation Models
Guerreiro, Nuno M., Alves, Duarte, Waldendorf, Jonas, Haddow, Barry, Birch, Alexandra, Colombo, Pierre, Martins, Andrรฉ F. T.
Large-scale multilingual machine translation systems have demonstrated remarkable ability to translate directly between numerous languages, making them increasingly appealing for real-world applications. However, when deployed in the wild, these models may generate hallucinated translations which have the potential to severely undermine user trust and raise safety concerns. Existing research on hallucinations has primarily focused on small bilingual models trained on high-resource languages, leaving a gap in our understanding of hallucinations in massively multilingual models across diverse translation scenarios. In this work, we fill this gap by conducting a comprehensive analysis on both the M2M family of conventional neural machine translation models and ChatGPT, a general-purpose large language model~(LLM) that can be prompted for translation. Our investigation covers a broad spectrum of conditions, spanning over 100 translation directions across various resource levels and going beyond English-centric language pairs. We provide key insights regarding the prevalence, properties, and mitigation of hallucinations, paving the way towards more responsible and reliable machine translation systems.
Multi-lingual Evaluation of Code Generation Models
Athiwaratkun, Ben, Gouda, Sanjay Krishna, Wang, Zijian, Li, Xiaopeng, Tian, Yuchen, Tan, Ming, Ahmad, Wasi Uddin, Wang, Shiqi, Sun, Qing, Shang, Mingyue, Gonugondla, Sujan Kumar, Ding, Hantian, Kumar, Varun, Fulton, Nathan, Farahani, Arash, Jain, Siddhartha, Giaquinto, Robert, Qian, Haifeng, Ramanathan, Murali Krishna, Nallapati, Ramesh, Ray, Baishakhi, Bhatia, Parminder, Sengupta, Sudipta, Roth, Dan, Xiang, Bing
We present new benchmarks on evaluation code generation models: MBXP and Multilingual HumanEval, and MathQA-X. These datasets cover over 10 programming languages and are generated using a scalable conversion framework that transpiles prompts and test cases from the original Python datasets into the corresponding data in the target language. Using these benchmarks, we are able to assess the performance of code generation models in a multi-lingual fashion, and discovered generalization ability of language models on out-of-domain languages, advantages of multi-lingual models over mono-lingual, the ability of few-shot prompting to teach the model new languages, and zero-shot translation abilities even on mono-lingual settings. Furthermore, we use our code generation model to perform large-scale bootstrapping to obtain synthetic canonical solutions in several languages, which can be used for other code-related evaluations such as code insertion, robustness, or summarization tasks. Overall, our benchmarks represents a significant step towards a deeper understanding of language models' code generation abilities. We publicly release our code and datasets at https://github.com/amazon-research/mxeval.
Linguistically Informed ChatGPT Prompts to Enhance Japanese-Chinese Machine Translation: A Case Study on Attributive Clauses
In the field of Japanese-Chinese translation linguistics, the issue of correctly translating attributive clauses has persistently proven to be challenging. Present-day machine translation tools often fail to accurately translate attributive clauses from Japanese to Chinese. In light of this, this paper investigates the linguistic problem underlying such difficulties, namely how does the semantic role of the modified noun affect the selection of translation patterns for attributive clauses, from a linguistic perspective. To ad-dress these difficulties, a pre-edit scheme is proposed, which aims to enhance the accuracy of translation. Furthermore, we propose a novel two-step prompt strategy, which combines this pre-edit scheme with ChatGPT, currently the most widely used large language model. This prompt strategy is capable of optimizing translation input in zero-shot scenarios and has been demonstrated to improve the average translation accuracy score by over 35%.