Goto

Collaborating Authors

 Machine Translation


Understanding Attention in Natural Language Processing with 3 Projects

#artificialintelligence

In this blog post, I'll summarize my understanding of attention used in natural language processing (NLP). As a machine learning and NLP self-learner, when I initially got exposed to the idea of attention, I felt overwhelmed by its whole bunch of different variations and all the nitty-gritties involved in the implementations. Now, after reading articles, blogs and code, watching YouTube videos and also implementing it myself in several projects, I found it actually not that hard to understand when looking back. Hopefully by sharing what I learned along the journey, I could help some of those who are also going though that learning process, especially beginners like who I was a couple of months ago, speed up their progress and make it a bit more enjoyable. The concept of attention was firstly widely spread because of its use in the sequence-to-sequence (seq2seq) model for neural machine translation.


Automatic Text Simplification of News Articles in the Context of Public Broadcasting

arXiv.org Artificial Intelligence

This report summarizes the work carried out by the authors during the Twelfth Montreal Industrial Problem Solving Workshop, held at Universitรฉ de Montrรฉal in August 2022. The team tackled a problem submitted by CBC/Radio-Canada on the theme of Automatic Text Simplification (ATS). In order to make its written content more widely accessible, and to support its second-language teaching activities, CBC/RC has recently been exploring the potential of automatic methods to simplify texts. They have developed a modular lexical simplification system (LSS), which identifies complex words in French and English texts, and replaces them with simpler, more common equivalents. Recently however, the ATS research community has proposed a number of approaches that rely on deep learning methods to perform more elaborate transformations, not limited to just lexical substitutions, but covering syntactic restructuring and conceptual simplifications as well.


Differentiable N-gram Objective on Abstractive Summarization

arXiv.org Artificial Intelligence

ROUGE is a standard automatic evaluation metric based on n-grams for sequence-to-sequence tasks, while cross-entropy loss is an essential objective of neural network language model that optimizes at a unigram level. We present differentiable n-gram objectives, attempting to alleviate the discrepancy between training criterion and evaluating criterion. The objective maximizes the probabilistic weight of matched sub-sequences, and the novelty of our work is the objective weights the matched sub-sequences equally and does not ceil the number of matched sub-sequences by the ground truth count of n-grams in reference sequence. We jointly optimize cross-entropy loss and the proposed objective, providing decent ROUGE score enhancement over abstractive summarization dataset CNN/DM and XSum, outperforming alternative n-gram objectives.


Optimizing Deep Transformers for Chinese-Thai Low-Resource Translation

arXiv.org Artificial Intelligence

In this paper, we study the use of deep Transformer translation model for the CCMT 2022 Chinese Thai low-resource machine translation task. We first explore the experiment settings (including the number of BPE merge operations, dropout probability, embedding size, etc.) for the low-resource scenario with the 6-layer Transformer. Considering that increasing the number of layers also increases the regularization on new model parameters (dropout modules are also introduced when using more layers), we adopt the highest performance setting but increase the depth of the Transformer to 24 layers to obtain improved translation quality. Our work obtains the SOTA performance in the Chinese-to-Thai translation in the constrained evaluation.


SYMBA: Symbolic Computation of Squared Amplitudes in High Energy Physics with Machine Learning

arXiv.org Artificial Intelligence

The cross section is one of the most important physical quantities in high-energy physics and the most time consuming to compute. While machine learning has proven to be highly successful in numerical calculations in high-energy physics, analytical calculations using machine learning are still in their infancy. In this work, we use a sequence-to-sequence model, specifically, a transformer, to compute a key element of the cross section calculation, namely, the squared amplitude of an interaction. We show that a transformer model is able to predict correctly 97.6% and 99% of squared amplitudes of QCD and QED processes, respectively, at a speed that is up to orders of magnitude faster than current symbolic computation frameworks. We discuss the performance of the current model, its limitations and possible future directions for this work.


Dubbing in Practice: A Large Scale Study of Human Localization With Insights for Automatic Dubbing

arXiv.org Artificial Intelligence

We investigate how humans perform the task of dubbing video content from one language into another, leveraging a novel corpus of 319.57 hours of video from 54 professionally produced titles. This is the first such large-scale study we are aware of. The results challenge a number of assumptions commonly made in both qualitative literature on human dubbing and machine-learning literature on automatic dubbing, arguing for the importance of vocal naturalness and translation quality over commonly emphasized isometric (character length) and lip-sync constraints, and for a more qualified view of the importance of isochronic (timing) constraints. We also find substantial influence of the source-side audio on human dubs through channels other than the words of the translation, pointing to the need for research on ways to preserve speech characteristics, as well as semantic transfer such as emphasis/emotion, in automatic dubbing systems.


Improving Automated Program Repair with Domain Adaptation

arXiv.org Artificial Intelligence

Automated Program Repair (APR) is defined as the process of fixing a bug/defect in the source code, by an automated tool. APR tools have recently experienced promising results by leveraging state-of-the-art Neural Language Processing (NLP) techniques. APR tools such as TFix and CodeXGLUE combine text-to-text transformers with software-specific techniques are outperforming alternatives, these days. However, in most APR studies the train and test sets are chosen from the same set of projects. In reality, however, APR models are meant to be generalizable to new and different projects. Therefore, there is a potential threat that reported APR models with high effectiveness perform poorly when the characteristics of the new project or its bugs are different than the training set's(Domain Shift). In this study, we first define and measure the domain shift problem in automated program repair. Then, we then propose a domain adaptation framework that can adapt an APR model for a given target project. We conduct an empirical study with three domain adaptation methods FullFineTuning, TuningWithLightWeightAdapterLayers, and CurriculumLearning using two state-of-the-art domain adaptation tools (TFix and CodeXGLUE) and two APR models on 611 bugs from 19 projects. The results show that our proposed framework can improve the effectiveness of TFix by 13.05% and CodeXGLUE by 23.4%. Another contribution of this study is the proposal of a data synthesis method to address the lack of labelled data in APR. We leverage transformers to create a bug generator model. We use the generated synthetic data to domain adapt TFix and CodeXGLUE on the projects with no data (Zero-shot learning), which results in an average improvement of 5.76% and 24.42% for TFix and CodeXGLUE, respectively.


AIhub monthly digest: December 2022 โ€“ AI around the world, teleoperation, and multilingual translation

AIHub

Welcome to our December 2022 monthly digest, where you can catch up with any AIhub stories you may have missed, get the low-down on recent events, and much more. This month, we hear from best paper award winners at ICIP and NeurIPS, and find out more about teleoperation, multilingual translation, and quality-diversity algorithms. We also have exciting news, in the form of a new focus series. We're delighted to announce the launch of our new focus series on AI around the world, where we cover exciting applications of AI across the globe. To kick off the series, we spoke with Rose Nakasi.


A Mutation-based Text Generation for Adversarial Machine Learning Applications

arXiv.org Artificial Intelligence

Currently, text generation is widely used in Machine Learning (ML)-based or Artificial Intelligence (AI)-based natural language applications such as language to language translation, document summary, headline or abstract generation. Those applications can be classified into different categories. In one classification, they can be divided into short versus long text generation applications. Short text generation applications include examples such as predicting next word or statement, image caption generation, short language translation, and documents summarization. Long text generation applications include long text story completion, review generation, language translation, poetry generation, and question answering.


Parameter-efficient Zero-shot Transfer for Cross-Language Dense Retrieval with Adapters

arXiv.org Artificial Intelligence

A popular approach to creating a zero-shot cross-language retrieval model is to substitute a monolingual pretrained language model in the retrieval model with a multilingual pretrained language model such as Multilingual BERT. This multilingual model is fined-tuned to the retrieval task with monolingual data such as English MS MARCO using the same training recipe as the monolingual retrieval model used. However, such transferred models suffer from mismatches in the languages of the input text during training and inference. In this work, we propose transferring monolingual retrieval models using adapters, a parameter-efficient component for a transformer network. By adding adapters pretrained on language tasks for a specific language with task-specific adapters, prior work has shown that the adapter-enhanced models perform better than fine-tuning the entire model when transferring across languages in various NLP tasks. By constructing dense retrieval models with adapters, we show that models trained with monolingual data are more effective than fine-tuning the entire model when transferring to a Cross Language Information Retrieval (CLIR) setting. However, we found that the prior suggestion of replacing the language adapters to match the target language at inference time is suboptimal for dense retrieval models. We provide an in-depth analysis of this discrepancy between other cross-language NLP tasks and CLIR.