Goto

Collaborating Authors

 Machine Translation


PicTalky: Augmentative and Alternative Communication Software for Language Developmental Disabilities

arXiv.org Artificial Intelligence

Augmentative and alternative communication (AAC) is a practical means of communication for people with language disabilities. In this study, we propose PicTalky, which is an AI-based AAC system that helps children with language developmental disabilities to improve their communication skills and language comprehension abilities. PicTalky can process both text and pictograms more accurately by connecting a series of neural-based NLP modules. Moreover, we perform quantitative and qualitative analyses on the essential features of PicTalky. It is expected that those suffering from language problems will be able to express their intentions or desires more easily and improve their quality of life by using this service. We have made the models freely available alongside a demonstration of the Web interface. Furthermore, we implemented robotics AAC for the first time by applying PicTalky to the NAO robot.


Overcoming Catastrophic Forgetting in Zero-Shot Cross-Lingual Generation

arXiv.org Artificial Intelligence

In this paper, we explore the challenging problem of performing a generative task in a target language when labeled data is only available in English, using summarization as a case study. We assume a strict setting with no access to parallel data or machine translation and find that common transfer learning approaches struggle in this setting, as a generative multilingual model fine-tuned purely on English catastrophically forgets how to generate non-English. Given the recent rise of parameter-efficient adaptation techniques, we conduct the first investigation into how one such method, prompt tuning (Lester et al., 2021), can overcome catastrophic forgetting to enable zero-shot cross-lingual generation. Our experiments show that parameter-efficient prompt tuning provides gains over standard fine-tuning when transferring between less-related languages, e.g., from English to Thai. However, a significant gap still remains between these methods and fully-supervised baselines. To improve cross-lingual transfer further, we explore several approaches, including: (1) mixing in unlabeled multilingual data, and (2) explicitly factoring prompts into recombinable language and task components. Our approaches can provide further quality gains, suggesting that robust zero-shot cross-lingual generation is within reach.


Precisely the Point: Adversarial Augmentations for Faithful and Informative Text Generation

arXiv.org Artificial Intelligence

Though model robustness has been extensively studied in language understanding, the robustness of Seq2Seq generation remains understudied. In this paper, we conduct the first quantitative analysis on the robustness of pre-trained Seq2Seq models. We find that even current SOTA pre-trained Seq2Seq model (BART) is still vulnerable, which leads to significant degeneration in faithfulness and informativeness for text generation tasks. This motivated us to further propose a novel adversarial augmentation framework, namely AdvSeq, for generally improving faithfulness and informativeness of Seq2Seq models via enhancing their robustness. AdvSeq automatically constructs two types of adversarial augmentations during training, including implicit adversarial samples by perturbing word representations and explicit adversarial samples by word swapping, both of which effectively improve Seq2Seq robustness. Extensive experiments on three popular text generation tasks demonstrate that AdvSeq significantly improves both the faithfulness and informativeness of Seq2Seq generation under both automatic and human evaluation settings.


Hard Gate Knowledge Distillation -- Leverage Calibration for Robust and Reliable Language Model

arXiv.org Artificial Intelligence

In knowledge distillation, a student model is trained with supervisions from both knowledge from a teacher and observations drawn from a training data distribution. Knowledge of a teacher is considered a subject that holds inter-class relations which send a meaningful supervision to a student; hence, much effort has been put to find such knowledge to be distilled. In this paper, we explore a question that has been given little attention: "when to distill such knowledge." The question is answered in our work with the concept of model calibration; we view a teacher model not only as a source of knowledge but also as a gauge to detect miscalibration of a student. This simple and yet novel view leads to a hard gate knowledge distillation scheme that switches between learning from a teacher model and training data. We verify the gating mechanism in the context of natural language generation at both the token-level and the sentence-level. Empirical comparisons with strong baselines show that hard gate knowledge distillation not only improves model generalization, but also significantly lowers model calibration error.


Dynamic Position Encoding for Transformers

arXiv.org Artificial Intelligence

Recurrent models have been dominating the field of neural machine translation (NMT) for the past few years. Transformers \citep{vaswani2017attention}, have radically changed it by proposing a novel architecture that relies on a feed-forward backbone and self-attention mechanism. Although Transformers are powerful, they could fail to properly encode sequential/positional information due to their non-recurrent nature. To solve this problem, position embeddings are defined exclusively for each time step to enrich word information. However, such embeddings are fixed after training regardless of the task and the word ordering system of the source or target language. In this paper, we propose a novel architecture with new position embeddings depending on the input text to address this shortcoming by taking the order of target words into consideration. Instead of using predefined position embeddings, our solution generates new embeddings to refine each word's position information. Since we do not dictate the position of source tokens and learn them in an end-to-end fashion, we refer to our method as dynamic position encoding (DPE). We evaluated the impact of our model on multiple datasets to translate from English into German, French, and Italian and observed meaningful improvements in comparison to the original Transformer.


Rainier: Reinforced Knowledge Introspector for Commonsense Question Answering

arXiv.org Artificial Intelligence

Knowledge underpins reasoning. Recent research demonstrates that when relevant knowledge is provided as additional context to commonsense question answering (QA), it can substantially enhance the performance even on top of state-of-the-art. The fundamental challenge is where and how to find such knowledge that is high quality and on point with respect to the question; knowledge retrieved from knowledge bases are incomplete and knowledge generated from language models are inconsistent. We present Rainier, or Reinforced Knowledge Introspector, that learns to generate contextually relevant knowledge in response to given questions. Our approach starts by imitating knowledge generated by GPT-3, then learns to generate its own knowledge via reinforcement learning where rewards are shaped based on the increased performance on the resulting question answering. Rainier demonstrates substantial and consistent performance gains when tested over 9 different commonsense benchmarks: including 5 datasets that are seen during model training, as well as 4 datasets that are kept unseen. Our work is the first to report that knowledge generated by models that are orders of magnitude smaller than GPT-3, even without direct supervision on the knowledge itself, can exceed the quality of commonsense knowledge elicited from GPT-3.


Welcome Back!

Communications of the ACM

Computational sciences in the India region are going through an exciting time. While India has always had significant strength in theoretical computer science (CS), in recent years it has developed substantial presence and maturity in other, more applied areas of CS such as hardware and computer architecture, data science and artificial intelligence (AI), and cyber-security. Alongside pure research, there has been a significant push toward lab-to-field projects and technology transfer and deployment, creating broad impact to the region and beyond. Significant efforts have been made on the democratization of education through online courses, enabling the vast population to learn from a relatively limited number of available experts. All these activities have continued to bolster India's already strong IT industry and been a factor in the huge increase in the number of startups (under 1,000 in 2016 to over 60,000 in 2022a), with the number of unicorn startups reaching 100.b


Using AI to Translate Speech For a Primarily Oral Language

#artificialintelligence

AI-powered speech translation has mainly focused on written languages, yet nearly 3,500 living languages are primarily spoken and don't have a widely used writing system. This makes it impossible to build machine translation tools using standard techniques, which require large amounts of written text in order to train an AI model. To address this challenge, we've built the first AI-powered speech-to-speech translation system for Hokkien, a primarily oral language that's widely spoken within the Chinese diaspora but lacks a standard written form. We're open-sourcing our Hokkien translation models, evaluation datasets and research papers so that others can reproduce and build on our work. The translation system is part of our Universal Speech Translator project, which is developing new AI methods that we hope will eventually allow real-time speech-to-speech translation across many languages.


Turning Fixed to Adaptive: Integrating Post-Evaluation into Simultaneous Machine Translation

arXiv.org Artificial Intelligence

However, the previous methods, including fixed Simultaneous machine translation (SiMT) (Gu and adaptive policies, lack evaluation before taking et al., 2017; Ma et al., 2019; Arivazhagan et al., the next action. For fixed policy (Ma et al., 2019; 2019; Ma et al., 2020; Zhang and Feng, 2021b, Elbayad et al., 2020; Zhang et al., 2021; Zhang 2022d) starts translation before reading the whole and Feng, 2021c), the model generates translation source sentence. It seeks to achieve good latencyquality according to the predefined translation rules. Although tradeoffs and is suitable for various scenarios it only relies on simple training methods, with different latency tolerances. Compared to it cannot make full use of the context to decide an full-sentence machine translation, SiMT is more appropriate translation policy. For adaptive policy challenging because it lacks partial source content (Gu et al., 2017; Arivazhagan et al., 2019; Ma in translation and needs to decide on translation et al., 2020; Zhang et al., 2022), the model can policy additionally.


Gui at MixMT 2022 : English-Hinglish: An MT approach for translation of code mixed data

arXiv.org Artificial Intelligence

Code-mixed machine translation has become an important task in multilingual communities and extending the task of machine translation to code mixed data has become a common task for these languages. In the shared tasks of WMT 2022, we try to tackle the same for both English + Hindi to Hinglish and Hinglish to English. The first task dealt with both Roman and Devanagari script as we had monolingual data in both English and Hindi whereas the second task only had data in Roman script. To our knowledge, we achieved one of the top ROUGE-L and WER scores for the first task of Monolingual to Code-Mixed machine translation. In this paper, we discuss the use of mBART with some special pre-processing and post-processing (transliteration from Devanagari to Roman) for the first task in detail and the experiments that we performed for the second task of translating code-mixed Hinglish to monolingual English.