Machine Translation
AI and the Everything in the Whole Wide World Benchmark
Raji, Inioluwa Deborah, Bender, Emily M., Paullada, Amandalynne, Denton, Emily, Hanna, Alex
There is a tendency across different subfields in AI to valorize a small collection of influential benchmarks. These benchmarks operate as stand-ins for a range of anointed common problems that are frequently framed as foundational milestones on the path towards flexible and generalizable AI systems. State-of-the-art performance on these benchmarks is widely understood as indicative of progress towards these long-term goals. In this position paper, we explore the limits of such benchmarks in order to reveal the construct validity issues in their framing as the functionally "general" broad measures of progress they are set up to be.
Does constituency analysis enhance domain-specific pre-trained BERT models for relation extraction?
Tang, Anfu, Deléger, Louise, Bossy, Robert, Zweigenbaum, Pierre, Nédellec, Claire
Recently many studies have been conducted on the topic of relation extraction. The DrugProt track at BioCreative VII provides a manually-annotated corpus for the purpose of the development and evaluation of relation extraction systems, in which interactions between chemicals and genes are studied. We describe the ensemble system that we used for our submission, which combines predictions of fine-tuned bioBERT, sciBERT and const-bioBERT models by majority voting. We specifically tested the contribution of syntactic information to relation extraction with BERT. We observed that adding constituentbased syntactic information to BERT improved precision, but decreased recall, since relations rarely seen in the train set were less likely to be predicted by BERT models in which the syntactic information is infused. Our code is available online [https://github.com/Maple177/drugprot-relation-extraction].
Microsoft's Tutel optimizes mixture of experts model training
Let the OSS Enterprise newsletter guide your open source journey! Microsoft this week announced Tutel, a library to support the development of mixture of experts (MoE) models -- a particular type of large-scale AI model. Tutel, which is open source and has been integrated into fairseq, one of Facebook's toolkits in PyTorch, is designed to enable developers across AI disciplines to "execute MoE more easily and efficiently," Microsoft says. MoE are made up of small clusters of "neurons" that are only active under special, specific circumstances. Lower "layers" of the MoE model extract features and experts are called upon to evaluate those features.
Sentence correction to improve NLP tasks performance
We have many public platforms and social media platforms for communications, exchange/share of information, expressing feelings, etc… There are many state-of-the-art NLP tasks that run on the text data available on these public or social media platforms, but the test data is not up to the distribution of standard English language which affects the performance of the said tasks. So here we take the input sentence which is corrupted and project it to the target sentence which is in the distribution of standard English. By using this we can improve the performance of most NLP tasks. Input sentences will have corruption and we convert it into standard English while preserving the semantic meaning of the sentences. As mentioned in the research paper, we will be using Sequence cross-entropy (Categorical cross-entropy) as our loss function, where we sum over cross-entropy loss at each time step in predicting the character for the current time step.
Introducing the First AI Model That Translates 100 Languages Without Relying on English
Next, we introduced a new bridge mining strategy, in which we group languages into 14 language groups based on linguistic classification, geography, and cultural similarities. People living in countries with languages of the same family tend to communicate more often and would benefit from high-quality translations. For instance, one group would include languages spoken in India, like Bengali, Hindi, Marathi, Nepali, Tamil, and Urdu. To connect the languages of different groups, we identified a small number of bridge languages, which are usually one to three major languages of each group. In the example above, Hindi, Bengali, and Tamil would be bridge languages for Indo-Aryan languages.
Meta AI Puts A Step Towards Building Universal Translation System
What does the curve arrow in the logo of Amazon signify? It simply portrays that one can get A to Z products from a single platform, making your task easy, right? The same will be the case when it comes to the translation system (production of text in one language from another). To that end, Meta AI announced a new breakthrough and introduced a new multilingual model, outperforming present state-of-the-art bilingual models across 10 out of 14 language pairs, winning the Conference on Machine Translation (WMT) – a prestigious MT competition. The model thus introduced is a step towards building a universal translation system. We built & open sourced the first-ever multilingual model to win the prestigious WMT competition, showing this approach is the future of machine translation.
Top 12 Machine Learning Algorithms You Should Know to Become a Data Scientist
Let's say I am given an Excel sheet with data about various fruits and I have to tell which look like Apples. What I will do is ask a question "Which fruits are red and round?" and divide all fruits which answer yes and no to the question. Now, All Red and Round fruits might not be apples and all apples won't be red and round. So I will ask a question "Which fruits have red or yellow color hints on them? " on red and round fruits and will ask "Which fruits are green and round?" on not red and round fruits. Based on these questions I can tell with considerable accuracy which are apples. This cascade of questions is what a decision tree is. However, this is a decision tree based on my intuition.
DEEP: DEnoising Entity Pre-training for Neural Machine Translation
Hu, Junjie, Hayashi, Hiroaki, Cho, Kyunghyun, Neubig, Graham
It has been shown that machine translation models usually generate poor translations for named entities that are infrequent in the training corpus. Earlier named entity translation methods mainly focus on phonetic transliteration, which ignores the sentence context for translation and is limited in domain and language coverage. To address this limitation, we propose DEEP, a DEnoising Entity Pre-training method that leverages large amounts of monolingual data and a knowledge base to improve named entity translation accuracy within sentences. Besides, we investigate a multi-task learning strategy that finetunes a pre-trained neural machine translation model on both entity-augmented monolingual data and parallel data to further improve entity translation. Experimental results on three language pairs demonstrate that \method results in significant improvements over strong denoising auto-encoding baselines, with a gain of up to 1.3 BLEU and up to 9.2 entity accuracy points for English-Russian translation.
Attention Mechanism in Vision Models
In this article, we would like to explore the attention mechanism and subsequently understand its application in vision models. Attention was first introduced in the paper by Bahdanau et al. for neural machine translation. Attention is a technique that enables a network to focus better on the parts of the input data that is more important to making a prediction. Since being introduced, it has revolutionized the entire field of NLP by being a key component in all the state-of-the-art models for a variety of tasks. The first paper we are discussing is'Attention Is All You Need' published by Google Brain.