Machine Translation
Rainier: Reinforced Knowledge Introspector for Commonsense Question Answering
Liu, Jiacheng, Hallinan, Skyler, Lu, Ximing, He, Pengfei, Welleck, Sean, Hajishirzi, Hannaneh, Choi, Yejin
Knowledge underpins reasoning. Recent research demonstrates that when relevant knowledge is provided as additional context to commonsense question answering (QA), it can substantially enhance the performance even on top of state-of-the-art. The fundamental challenge is where and how to find such knowledge that is high quality and on point with respect to the question; knowledge retrieved from knowledge bases are incomplete and knowledge generated from language models are inconsistent. We present Rainier, or Reinforced Knowledge Introspector, that learns to generate contextually relevant knowledge in response to given questions. Our approach starts by imitating knowledge generated by GPT-3, then learns to generate its own knowledge via reinforcement learning where rewards are shaped based on the increased performance on the resulting question answering. Rainier demonstrates substantial and consistent performance gains when tested over 9 different commonsense benchmarks: including 5 datasets that are seen during model training, as well as 4 datasets that are kept unseen. Our work is the first to report that knowledge generated by models that are orders of magnitude smaller than GPT-3, even without direct supervision on the knowledge itself, can exceed the quality of commonsense knowledge elicited from GPT-3.
Welcome Back!
Computational sciences in the India region are going through an exciting time. While India has always had significant strength in theoretical computer science (CS), in recent years it has developed substantial presence and maturity in other, more applied areas of CS such as hardware and computer architecture, data science and artificial intelligence (AI), and cyber-security. Alongside pure research, there has been a significant push toward lab-to-field projects and technology transfer and deployment, creating broad impact to the region and beyond. Significant efforts have been made on the democratization of education through online courses, enabling the vast population to learn from a relatively limited number of available experts. All these activities have continued to bolster India's already strong IT industry and been a factor in the huge increase in the number of startups (under 1,000 in 2016 to over 60,000 in 2022a), with the number of unicorn startups reaching 100.b
Using AI to Translate Speech For a Primarily Oral Language
AI-powered speech translation has mainly focused on written languages, yet nearly 3,500 living languages are primarily spoken and don't have a widely used writing system. This makes it impossible to build machine translation tools using standard techniques, which require large amounts of written text in order to train an AI model. To address this challenge, we've built the first AI-powered speech-to-speech translation system for Hokkien, a primarily oral language that's widely spoken within the Chinese diaspora but lacks a standard written form. We're open-sourcing our Hokkien translation models, evaluation datasets and research papers so that others can reproduce and build on our work. The translation system is part of our Universal Speech Translator project, which is developing new AI methods that we hope will eventually allow real-time speech-to-speech translation across many languages.
Turning Fixed to Adaptive: Integrating Post-Evaluation into Simultaneous Machine Translation
Guo, Shoutao, Zhang, Shaolei, Feng, Yang
However, the previous methods, including fixed Simultaneous machine translation (SiMT) (Gu and adaptive policies, lack evaluation before taking et al., 2017; Ma et al., 2019; Arivazhagan et al., the next action. For fixed policy (Ma et al., 2019; 2019; Ma et al., 2020; Zhang and Feng, 2021b, Elbayad et al., 2020; Zhang et al., 2021; Zhang 2022d) starts translation before reading the whole and Feng, 2021c), the model generates translation source sentence. It seeks to achieve good latencyquality according to the predefined translation rules. Although tradeoffs and is suitable for various scenarios it only relies on simple training methods, with different latency tolerances. Compared to it cannot make full use of the context to decide an full-sentence machine translation, SiMT is more appropriate translation policy. For adaptive policy challenging because it lacks partial source content (Gu et al., 2017; Arivazhagan et al., 2019; Ma in translation and needs to decide on translation et al., 2020; Zhang et al., 2022), the model can policy additionally.
Gui at MixMT 2022 : English-Hinglish: An MT approach for translation of code mixed data
Gahoi, Akshat, Duneja, Jayant, Padhi, Anshul, Mangale, Shivam, Rajput, Saransh, Kamble, Tanvi, Sharma, Dipti Misra, Varma, Vasudeva
Code-mixed machine translation has become an important task in multilingual communities and extending the task of machine translation to code mixed data has become a common task for these languages. In the shared tasks of WMT 2022, we try to tackle the same for both English + Hindi to Hinglish and Hinglish to English. The first task dealt with both Roman and Devanagari script as we had monolingual data in both English and Hindi whereas the second task only had data in Roman script. To our knowledge, we achieved one of the top ROUGE-L and WER scores for the first task of Monolingual to Code-Mixed machine translation. In this paper, we discuss the use of mBART with some special pre-processing and post-processing (transliteration from Devanagari to Roman) for the first task in detail and the experiments that we performed for the second task of translating code-mixed Hinglish to monolingual English.
Text Editing as Imitation Game
Shi, Ning, Tang, Bin, Yuan, Bo, Huang, Longtao, Pu, Yewen, Fu, Jie, Lin, Zhouhan
Text editing, such as grammatical error correction, arises naturally from imperfect textual data. Recent works frame text editing as a multi-round sequence tagging task, where operations -- such as insertion and substitution -- are represented as a sequence of tags. While achieving good results, this encoding is limited in flexibility as all actions are bound to token-level tags. In this work, we reformulate text editing as an imitation game using behavioral cloning. Specifically, we convert conventional sequence-to-sequence data into state-to-action demonstrations, where the action space can be as flexible as needed. Instead of generating the actions one at a time, we introduce a dual decoders structure to parallel the decoding while retaining the dependencies between action tokens, coupled with trajectory augmentation to alleviate the distribution shift that imitation learning often suffers. In experiments on a suite of Arithmetic Equation benchmarks, our model consistently outperforms the autoregressive baselines in terms of performance, efficiency, and robustness. We hope our findings will shed light on future studies in reinforcement learning applying sequence-level action generation to natural language processing.
A baseline revisited: Pushing the limits of multi-segment models for context-aware translation
Majumder, Suvodeep, Lauly, Stanislas, Nadejde, Maria, Federico, Marcello, Dinu, Georgiana
We show that multi-sentence translation can The quality of NMT (Neural Machine Translation) benefit from increased-capacity transformer models has been improving over the years and models and that deeper models are better at is narrowing the gap to human translation performance learning contextual dependencies than wider (Hassan et al., 2018). Until recently, most models. of the MT research has focused on translating and evaluating sentences in isolation, ignoring the context We further show that distilled models can in which these sentences occur. Simplifying learn contextual dependencies from larger the translation task this way has its advantages: models, while reducing computational cost data sets are easier to create, models are computationally and increasing robustness to input length variations.
University of Cape Town's WMT22 System: Multilingual Machine Translation for Southern African Languages
Elmadani, Khalid N., Meyer, Francois, Buys, Jan
The paper describes the University of Cape Town's submission to the constrained track of the WMT22 Shared Task: Large-Scale Machine Translation Evaluation for African Languages. Our system is a single multilingual translation model that translates between English and 8 South / South East African Languages, as well as between specific pairs of the African languages. We used several techniques suited for low-resource machine translation (MT), including overlap BPE, back-translation, synthetic training data generation, and adding more translation directions during training. Our results show the value of these techniques, especially for directions where very little or no bilingual training data is available.
Non-Autoregressive Neural Machine Translation: A Call for Clarity
Schmidt, Robin M., Pires, Telmo, Peitz, Stephan, Lööf, Jonas
Non-autoregressive approaches aim to improve the inference speed of translation models by only requiring a single forward pass to generate the output sequence instead of iteratively producing each predicted token. Consequently, their translation quality still tends to be inferior to their autoregressive counterparts due to several issues involving output token interdependence. In this work, we take a step back and revisit several techniques that have been proposed for improving non-autoregressive translation models and compare their combined translation quality and speed implications under third-party testing environments. We provide novel insights for establishing strong baselines using length prediction or CTC-based architecture variants and contribute standardized BLEU, chrF++, and TER scores using sacreBLEU on four translation tasks, which crucially have been missing as inconsistencies in the use of tokenized BLEU lead to deviations of up to 1.7 BLEU points. Our open-sourced code is integrated into fairseq for reproducibility.
Is Encoder-Decoder Redundant for Neural Machine Translation?
Gao, Yingbo, Herold, Christian, Yang, Zijian, Ney, Hermann
Encoder-decoder architecture is widely adopted for sequence-to-sequence modeling tasks. For machine translation, despite the evolution from long short-term memory networks to Transformer networks, plus the introduction and development of attention mechanism, encoder-decoder is still the de facto neural network architecture for state-of-the-art models. While the motivation for decoding information from some hidden space is straightforward, the strict separation of the encoding and decoding steps into an encoder and a decoder in the model architecture is not necessarily a must. Compared to the task of autoregressive language modeling in the target language, machine translation simply has an additional source sentence as context. Given the fact that neural language models nowadays can already handle rather long contexts in the target language, it is natural to ask whether simply concatenating the source and target sentences and training a language model to do translation would work. In this work, we investigate the aforementioned concept for machine translation. Specifically, we experiment with bilingual translation, translation with additional target monolingual data, and multilingual translation. In all cases, this alternative approach performs on par with the baseline encoder-decoder Transformer, suggesting that an encoder-decoder architecture might be redundant for neural machine translation.