Machine Translation
Pay Better Attention to Attention: Head Selection in Multilingual and Multi-Domain Sequence Modeling
Gong, Hongyu, Tang, Yun, Pino, Juan, Li, Xian
Multi-head attention has each of the attention heads collect salient information from different parts of an input sequence, making it a powerful mechanism for sequence modeling. Multilingual and multi-domain learning are common scenarios for sequence modeling, where the key challenge is to maximize positive transfer and mitigate negative transfer across languages and domains. In this paper, we find that non-selective attention sharing is sub-optimal for achieving good generalization across all languages and domains. We further propose attention sharing strategies to facilitate parameter sharing and specialization in multilingual and multi-domain sequence modeling. Our approach automatically learns shared and specialized attention heads for different languages and domains to mitigate their interference. Evaluated in various tasks including speech recognition, text-to-text and speech-to-text translation, the proposed attention sharing strategies consistently bring gains to sequence models built upon multi-head attention. For speech-to-text translation, our approach yields an average of $+2.0$ BLEU over $13$ language directions in multilingual setting and $+2.0$ BLEU over $3$ domains in multi-domain setting.
Leveraging Language to Learn Program Abstractions and Search Heuristics
Wong, Catherine, Ellis, Kevin, Tenenbaum, Joshua B., Andreas, Jacob
Inductive program synthesis, or inferring programs from examples of desired behavior, offers a general paradigm for building interpretable, robust, and generalizable machine learning systems. Effective program synthesis depends on two key ingredients: a strong library of functions from which to build programs, and an efficient search strategy for finding programs that solve a given task. We introduce LAPS (Language for Abstraction and Program Search), a technique for using natural language annotations to guide joint learning of libraries and neurally-guided search models for synthesis. When integrated into a state-of-the-art library learning system (DreamCoder), LAPS produces higher-quality libraries and improves search efficiency and generalization on three domains -- string editing, image composition, and abstract reasoning about scenes -- even when no natural language hints are available at test time.
Central Kurdish machine translation: First large scale parallel corpus and experiments
Amini, Zhila, Mohammadamini, Mohammad, Hosseini, Hawre, Mansouri, Mehran, Jaff, Daban
While the computational processing of Kurdish has experienced a relative increase, the machine translation of this language seems to be lacking a considerable body of scientific work. This is in part due to the lack of resources especially curated for this task. In this paper, we present the first large scale parallel corpus of Central Kurdish-English, Awta, containing 229,222 pairs of manually aligned translations. Our corpus is collected from different text genres and domains in an attempt to build more robust and real-world applications of machine translation. We make a portion of this corpus publicly available in order to foster research in this area. Further, we build several neural machine translation models in order to benchmark the task of Kurdish machine translation. Additionally, we perform extensive experimental analysis of results in order to identify the major challenges that Central Kurdish machine translation faces. These challenges include language-dependent and-independent ones as categorized in this paper, the first group of which are aware of Central Kurdish linguistic properties on different morphological, syntactic and semantic levels. Our best performing systems achieve 22.72 and 16.81 in BLEU score for Ku$\rightarrow$EN and En$\rightarrow$Ku, respectively.
Evaluating Gender Bias in Hindi-English Machine Translation
Gupta, Gauri, Ramesh, Krithika, Singh, Sanjay
With language models being deployed increasingly in the real world, it is essential to address the issue of the fairness of their outputs. The word embedding representations of these language models often implicitly draw unwanted associations that form a social bias within the model. The nature of gendered languages like Hindi, poses an additional problem to the quantification and mitigation of bias, owing to the change in the form of the words in the sentence, based on the gender of the subject. Additionally, there is sparse work done in the realm of measuring and debiasing systems for Indic languages. In our work, we attempt to evaluate and quantify the gender bias within a Hindi-English machine translation system. We implement a modified version of the existing TGBI metric based on the grammatical considerations for Hindi. We also compare and contrast the resulting bias measurements across multiple metrics for pre-trained embeddings and the ones learned by our machine translation model.
Code to Comment Translation: A Comparative Study on Model Effectiveness & Errors
Mahmud, Junayed, Faisal, Fahim, Arnob, Raihan Islam, Anastasopoulos, Antonios, Moran, Kevin
Automated source code summarization is a popular software engineering research topic wherein machine translation models are employed to "translate" code snippets into relevant natural language descriptions. Most evaluations of such models are conducted using automatic reference-based metrics. However, given the relatively large semantic gap between programming languages and natural language, we argue that this line of research would benefit from a qualitative investigation into the various error modes of current state-of-the-art models. Therefore, in this work, we perform both a quantitative and qualitative comparison of three recently proposed source code summarization models. In our quantitative evaluation, we compare the models based on the smoothed BLEU-4, METEOR, and ROUGE-L machine translation metrics, and in our qualitative evaluation, we perform a manual open-coding of the most common errors committed by the models when compared to ground truth captions. Our investigation reveals new insights into the relationship between metric-based performance and model prediction errors grounded in an empirically derived error taxonomy that can be used to drive future research efforts
AI 50 2021: America's Most Promising Artificial Intelligence Companies
The Covid-19 pandemic was devastating for many industries, but it only accelerated the use of artificial intelligence across the U.S. economy. Amid the crisis, companies scrambled to create new services for remote workers and students, beef up online shopping and dining options, make customer call centers more efficient and speed development of important new drugs. Even as applications of machine learning and perception platforms become commonplace, a thick layer of hype and fuzzy jargon clings to AI-enabled software.That makes it tough to identify the most compelling companies in the space--especially those finding new ways to use AI that create value by making humans more efficient, not redundant. With this in mind, Forbes has partnered with venture firms Sequoia Capital and Meritech Capital to create our third annual AI 50, a list of private, promising North American companies that are using artificial intelligence in ways that are fundamental to their operations. To be considered, businesses must be privately-held and utilizing machine learning (where systems learn from data to improve on tasks), natural language processing (which enables programs to "understand" written or spoken language) or computer vision (which relates to how machines "see"). AI companies incubated at, largely funded through or acquired by large tech, manufacturing or industrial firms aren't eligible for consideration. Our list was compiled through a submission process open to any AI company in the U.S. and Canada. The application asked companies to provide details on their technology, business model, customers and financials like funding, valuation and revenue history (companies had the option to submit information confidentially, to encourage greater transparency). Forbes received several hundred entries, of which nearly 400 qualified for consideration. From there, our data partners applied an algorithm to identify 100 companies with the highest quantitative scores--and that also made diversity a priority. Next, a panel of expert AI judges evaluated the finalists to find the 50 most compelling companies (they were precluded from judging companies in which they have a vested interest). Among trends this year are what Sequoia Capital's Konstantine Buhler calls AI workbench companies--building of platforms tailored to different enterprises, including Dataiku, DataRobot Domino Data and Databricks.
Improving Paraphrase Detection with the Adversarial Paraphrasing Task
Nighojkar, Animesh, Licato, John
If two sentences have the same meaning, it should follow that they are equivalent in their inferential properties, i.e., each sentence should textually entail the other. However, many paraphrase datasets currently in widespread use rely on a sense of paraphrase based on word overlap and syntax. Can we teach them instead to identify paraphrases in a way that draws on the inferential properties of the sentences, and is not over-reliant on lexical and syntactic similarities of a sentence pair? We apply the adversarial paradigm to this question, and introduce a new adversarial method of dataset creation for paraphrase identification: the Adversarial Paraphrasing Task (APT), which asks participants to generate semantically equivalent (in the sense of mutually implicative) but lexically and syntactically disparate paraphrases. These sentence pairs can then be used both to test paraphrase identification models (which get barely random accuracy) and then improve their performance. To accelerate dataset generation, we explore automation of APT using T5, and show that the resulting dataset also improves accuracy. We discuss implications for paraphrase detection and release our dataset in the hope of making paraphrase detection models better able to detect sentence-level meaning equivalence.
NG+ : A Multi-Step Matrix-Product Natural Gradient Method for Deep Learning
Yang, Minghan, Xu, Dong, Cui, Qiwen, Wen, Zaiwen, Xu, Pengxiang
In this paper, a novel second-order method called NG+ is proposed. By following the rule ``the shape of the gradient equals the shape of the parameter", we define a generalized fisher information matrix (GFIM) using the products of gradients in the matrix form rather than the traditional vectorization. Then, our generalized natural gradient direction is simply the inverse of the GFIM multiplies the gradient in the matrix form. Moreover, the GFIM and its inverse keeps the same for multiple steps so that the computational cost can be controlled and is comparable with the first-order methods. A global convergence is established under some mild conditions and a regret bound is also given for the online learning setting. Numerical results on image classification with ResNet50, quantum chemistry modeling with Schnet, neural machine translation with Transformer and recommendation system with DLRM illustrate that GN+ is competitive with the state-of-the-art methods.
An Empirical Survey of Data Augmentation for Limited Data Learning in NLP
Chen, Jiaao, Tam, Derek, Raffel, Colin, Bansal, Mohit, Yang, Diyi
NLP has achieved great progress in the past decade through the use of neural models and large labeled datasets. The dependence on abundant data prevents NLP models from being applied to low-resource settings or novel tasks where significant time, money, or expertise is required to label massive amounts of textual data. Recently, data augmentation methods have been explored as a means of improving data efficiency in NLP. To date, there has been no systematic empirical overview of data augmentation for NLP in the limited labeled data setting, making it difficult to understand which methods work in which settings. In this paper, we provide an empirical survey of recent progress on data augmentation for NLP in the limited labeled data setting, summarizing the landscape of methods (including token-level augmentations, sentence-level augmentations, adversarial augmentations, and hidden-space augmentations) and carrying out experiments on 11 datasets covering topics/news classification, inference tasks, paraphrasing tasks, and single-sentence tasks. Based on the results, we draw several conclusions to help practitioners choose appropriate augmentations in different settings and discuss the current challenges and future directions for limited data learning in NLP.
Common Sense Beyond English: Evaluating and Improving Multilingual Language Models for Commonsense Reasoning
Lin, Bill Yuchen, Lee, Seyeon, Qiao, Xiaoyang, Ren, Xiang
Commonsense reasoning research has so far been limited to English. We aim to evaluate and improve popular multilingual language models (ML-LMs) to help advance commonsense reasoning (CSR) beyond English. We collect the Mickey Corpus, consisting of 561k sentences in 11 different languages, which can be used for analyzing and improving ML-LMs. We propose Mickey Probe, a language-agnostic probing task for fairly evaluating the common sense of popular ML-LMs across different languages. In addition, we also create two new datasets, X-CSQA and X-CODAH, by translating their English versions to 15 other languages, so that we can evaluate popular ML-LMs for cross-lingual commonsense reasoning. To improve the performance beyond English, we propose a simple yet effective method -- multilingual contrastive pre-training (MCP). It significantly enhances sentence representations, yielding a large performance gain on both benchmarks.