Machine Translation
IBM Research at ACL 2020
Theย 58th Annual Meeting of the Association for Computational Linguisticsย (ACL 2020), theย premiere annual conference on AI and language, takes place July 5-10. As is the case with most events currently, ACL will be virtual this year due to COVID-19. Atย IBM Research AI, weโre excited to share with youย โย wherever you might be in the worldย โย all the work weโll have at ACL 2020 designed to advance AI for the enterprise. The ability of AI to master language has been one of IBM Research AIโs key areas of focus for years. The field ofย Natural Language Processing (NLP) is constantly evolving in efforts to better outfit AI with the ability to communicate similarly to how us humans can.ย Itโs an incredibly challenging area of research. An AI must identify, decipher, and navigate through natural language barriersย โย tasks like slang, idioms, acronyms, different languages and extractingย meaning from multi-format documents,ย to name a few. To tackle these challenges, IBM releasedย earlier this yearย a new, four-part mastering language taxonomyโฆ
An AI Researcher's Exploration of 200 Machine Learning Tools
To better understand the landscape of available tools for machine learning production, I decided to look up every AI/ML tool I could find. After filtering out applications companies (e.g. companies that use ML to provide business analytics), tools that aren't being actively developed, and tools that nobody uses, I got 202 tools. Please let me know if there are tools you think I should include but aren't on the list yet! The landscape is under-developed IV. I categorize the tools based on which step of the workflow that it supports. I don't include Project setup since it requires project management tools, not ML tools.
Java To Python And Back, AI That Translates Programming Languages
The Commonwealth Bank of Australia spent around $750 million and 5 years of work to convert its platform from COBOL to Java. Migrating an existing codebase to a modern or more efficient language like Java or C requires expertise in both the source and target languages, and is often costly. Usually, a transcompiler is deployed that converts source code from a high-level programming language (such as C or Python) to another. Transcompilers are primarily used for interoperability, and to port codebases written in an obsolete or deprecated language (e.g. They typically rely on handcrafted rewrite rules, applied to the source code abstract syntax tree.
Correction of Faulty Background Knowledge based on Condition Aware and Revise Transformer for Question Answering
Zhao, Xinyan, Feng, Xiao, Zhong, Haoming, Yao, Jun, Chen, Huanhuan
The study of question answering has received increasing attention in recent years. This work focuses on providing an answer that compatible with both user intent and conditioning information corresponding to the question, such as delivery status and stock information in e-commerce. However, these conditions may be wrong or incomplete in real-world applications. Although existing question answering systems have considered the external information, such as categorical attributes and triples in knowledge base, they all assume that the external information is correct and complete. To alleviate the effect of defective condition values, this paper proposes condition aware and revise Transformer (CAR-Transformer). CAR-Transformer (1) revises each condition value based on the whole conversation and original conditions values, and (2) it encodes the revised conditions and utilizes the conditions embedding to select an answer. Experimental results on a real-world customer service dataset demonstrate that the CAR-Transformer can still select an appropriate reply when conditions corresponding to the question exist wrong or missing values, and substantially outperforms baseline models on automatic and human evaluations. The proposed CAR-Transformer can be extended to other NLP tasks which need to consider conditioning information.
Adversarial Mutual Information for Text Generation
Pan, Boyuan, Yang, Yazheng, Liang, Kaizhao, Kailkhura, Bhavya, Jin, Zhongming, Hua, Xian-Sheng, Cai, Deng, Li, Bo
Recent advances in maximizing mutual information (MI) between the source and target have demonstrated its effectiveness in text generation. However, previous works paid little attention to modeling the backward network of MI (i.e., dependency from the target to the source), which is crucial to the tightness of the variational information maximization lower bound. In this paper, we propose Adversarial Mutual Information (AMI): a text generation framework which is formed as a novel saddle point (min-max) optimization aiming to identify joint interactions between the source and target. Within this framework, the forward and backward networks are able to iteratively promote or demote each other's generated instances by comparing the real and synthetic data distributions. We also develop a latent noise sampling strategy that leverages random variations at the high-level semantic space to enhance the long term dependency in the generation process. Extensive experiments based on different text generation tasks demonstrate that the proposed AMI framework can significantly outperform several strong baselines, and we also show that AMI has potential to lead to a tighter lower bound of maximum mutual information for the variational information maximization problem.
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Lepikhin, Dmitry, Lee, HyoukJoong, Xu, Yuanzhong, Chen, Dehao, Firat, Orhan, Huang, Yanping, Krikun, Maxim, Shazeer, Noam, Chen, Zhifeng
Neural network scaling has been critical for improving the model quality in many real-world machine learning applications with vast amounts of training data and compute. Although this trend of scaling is affirmed to be a sure-fire approach for better model quality, there are challenges on the path such as the computation cost, ease of programming, and efficient implementation on parallel devices. GShard is a module composed of a set of lightweight annotation APIs and an extension to the XLA compiler. It provides an elegant way to express a wide range of parallel computation patterns with minimal changes to the existing model code. GShard enabled us to scale up multilingual neural machine translation Transformer model with Sparsely-Gated Mixture-of-Experts beyond 600 billion parameters using automatic sharding. We demonstrate that such a giant model can efficiently be trained on 2048 TPU v3 accelerators in 4 days to achieve far superior quality for translation from 100 languages to English compared to the prior art.
An EM Approach to Non-autoregressive Conditional Sequence Generation
Autoregressive (AR) models have been the dominating approach to conditional sequence generation, but are suffering from the issue of high inference latency. Non-autoregressive (NAR) models have been recently proposed to reduce the latency by generating all output tokens in parallel but could only achieve inferior accuracy compared to their autoregressive counterparts, primarily due to a difficulty in dealing with the multi-modality in sequence generation. This paper proposes a new approach that jointly optimizes both AR and NAR models in a unified Expectation-Maximization (EM) framework. In the E-step, an AR model learns to approximate the regularized posterior of the NAR model. In the M-step, the NAR model is updated on the new posterior and selects the training examples for the next AR model. This iterative process can effectively guide the system to remove the multi-modality in the output sequences. To our knowledge, this is the first EM approach to NAR sequence generation. We evaluate our method on the task of machine translation. Experimental results on benchmark data sets show that the proposed approach achieves competitive, if not better, performance with existing NAR models and significantly reduces the inference latency.
Multi-Head Attention: Collaborate Instead of Concatenate
Cordonnier, Jean-Baptiste, Loukas, Andreas, Jaggi, Martin
Attention layers are widely used in natural language processing (NLP) and are beginning to influence computer vision architectures. However, they suffer from over-parameterization. For instance, it was shown that the majority of attention heads could be pruned without impacting accuracy. This work aims to enhance current understanding on how multiple heads interact. Motivated by the observation that trained attention heads share common key/query projections, we propose a collaborative multi-head attention layer that enables heads to learn shared projections. Our scheme improves the computational cost and number of parameters in an attention layer and can be used as a drop-in replacement in any transformer architecture. For instance, by allowing heads to collaborate on a neural machine translation task, we can reduce the key dimension by a factor of eight without any loss in performance. We also show that it is possible to re-parametrize a pre-trained multi-head attention layer into our collaborative attention layer. Even without retraining, collaborative multi-head attention manages to reduce the size of the key and query projections by half without sacrificing accuracy.
In English, Machine Translation Makes You Sound Like a Man in His Middle Age
MARKETING 24/06/2020 In English, Machine Translation Makes You Sound Like a Man in His Middle Age THREE BOCCONI SCHOLARS FOUND AN ALGORITHMIC BIAS IN THE SYSTEMS OF GOOGLE, BING, AND DEEPL, WHEN TRANSLATING FROM SEVERAL EUROPEAN LANGUAGES INTO ENGLISH Imagine a child raised in a village inhabited only by middle-aged men. For the first ten years of her life, she only hears males in their 60s talking of work, books, sports, health, and money. What kind of weird language do you think she will speak when she leaves the village? Something similar happens to the most common machine translation systems, according to a new study by Dirk Hovy, an Associate Professor of Computer Science at Bocconi, and two Postdoctoral Researchers in his lab, Federico Bianchi and Tommaso Fornaciari. To train a translation system based on machine learning, you feed it with large amounts of texts and let it learn by experience.
Tools For Building Machine Learning Models On Android
Ever since Android first came into existence in 2008, it has become the world's biggest mobile platform in terms of popularity and number of users. Over the years, Android developers have built advances in machine learning, features like on-device speech recognition, real-time video interactiveness, and real-time enhancements when taking a photo/selfie. In addition, image recognition with machine learning can enable users to point their smartphone camera at text and have it live-translated into 88 different languages with the help of Google Translate. Android users can even point your camera at a beautiful flower, use Google Lens to identify what type of flower that is, and then set a reminder to order a bouquet for someone. Google Lens is able to use computer vision models to expand and speed up web search and mobile experience.