Goto

Collaborating Authors

 Machine Translation


Transformers without Tears: Improving the Normalization of Self-Attention

arXiv.org Machine Learning

We evaluate three simple, normalization-centric changes to improve Transformer training. First, we show that pre-norm residual connections (PreNorm) and smaller initializations enable warmup-free, validation-based training with large learning rates. Second, we propose $\ell_2$ normalization with a single scale parameter (ScaleNorm) for faster training and better performance. Finally, we reaffirm the effectiveness of normalizing word embeddings to a fixed length (FixNorm). On five low-resource translation pairs from TED Talks-based corpora, these changes always converge, giving an average +1.1 BLEU over state-of-the-art bilingual baselines and a new 32.8 BLEU on IWSLT'15 English-Vietnamese. We observe sharper performance curves, more consistent gradient norms, and a linear relationship between activation scaling and decoder depth. Surprisingly, in the high-resource setting (WMT'14 English-German), ScaleNorm and FixNorm remain competitive but PreNorm degrades performance.


Using Neural Machine Translation for Multilingual Communication

#artificialintelligence

A new type of Artificial Intelligence (AI) technology, called Neural Machine Translation (NMT), is quickly earning the attention of multilingual communities. This software is helping to expedite the translation process and has the potential to open government information to more non-English languages. In this session, Beth Flaherty will give a high-level overview of machine translation technology. We will discuss the evolution of machine translation (MT), how MT is used in the government, ways to "specialize" a language engine to a specific domain, calculation of return on investment (ROI), and the road ahead. We'll also show a live demo of the NMT software so that the audience can see the flexibility of use with this technology.


BiPaR: A Bilingual Parallel Dataset for Multilingual and Cross-lingual Reading Comprehension on Novels

arXiv.org Artificial Intelligence

This paper presents BiPaR, a bilingual parallel novel-style machine reading comprehension (MRC) dataset, developed to support multilingual and cross-lingual reading comprehension. The biggest difference between BiPaR and existing reading comprehension datasets is that each triple (Passage, Question, Answer) in BiPaR is written parallelly in two languages. We collect 3,667 bilingual parallel paragraphs from Chinese and English novels, from which we construct 14,668 parallel question-answer pairs via crowdsourced workers following a strict quality control procedure. We analyze BiPaR in depth and find that BiPaR offers good diversification in prefixes of questions, answer types and relationships between questions and passages. We also observe that answering questions of novels requires reading comprehension skills of coreference resolution, multi-sentence reasoning, and understanding of implicit causality, etc. With BiPaR, we build monolingual, multilingual, and cross-lingual MRC baseline models. Even for the relatively simple monolingual MRC on this dataset, experiments show that a strong BERT baseline is over 30 points behind human in terms of both EM and F1 score, indicating that BiPaR provides a challenging testbed for monolingual, multilingual and cross-lingual MRC on novels. The dataset is available at https://multinlp.github.io/BiPaR/.


Artificial Intelligence and how they are empowering Search for Mobile, Web Apps - Ongraph

#artificialintelligence

Google Translate is one of the popular and highly useful product of Google. It is based on Artificial Intelligence Algorithm. Google is constantly changing its translation application using artificial intelligence (AI). It is using Neural Machine Translation into Google Translate, which has radically improved results. AI team of the company calls it the Google Neural Machine Translation System (GNMT).


Language Transfer for Early Warning of Epidemics from Social Media

arXiv.org Artificial Intelligence

Statements on social media can be analysed to identify individuals who are experiencing red flag medical symptoms, allowing early detection of the spread of disease such as influenza. Since disease does not respect cultural borders and may spread between populations speaking different languages, we would like to build multilingual models. However, the data required to train models for every language may be difficult, expensive and time-consuming to obtain, particularly for low-resource languages. Taking Japanese as our target language, we explore methods by which data in one language might be used to build models for a different language. We evaluate strategies of training on machine translated data and of zero-shot transfer through the use of multilingual models. We find that the choice of source language impacts the performance, with Chinese-Japanese being a better language pair than English-Japanese. Training on machine translated data shows promise, especially when used in conjunction with a small amount of target language data.


Straker Translations on Twitter

#artificialintelligence

The Japanese language is wonderfully unique, complex & can be one of the hardest languages to learn. So how well does machine translation handle the Japanese language? Have a read of our latest blog to find out.


Machine Learning Intern (Summer 2020) ai-jobs.net

#artificialintelligence

Mozilla is hiring a Machine Learning Intern for our Emerging Technologies team. Emerging Technologies is Mozilla's early research and development organization focused on the areas of voice assistants, speech and language, and mixed reality. Our headquarters are based in the Bay Area, but this internship opportunity is at our Berlin Office. We are engineers, designers, makers, and problem solvers. We work in the fishbowl known as the open source community, with a clear focus on making the Web better.


Why 85% of AI projects fail

#artificialintelligence

Despite increased interest in and adoption of artificial intelligence (AI) in the enterprise, 85% of AI projects ultimately fail to deliver on their intended promises to business, according to a Thursday report from Pactera Technologies. A major source of AI challenges is found in senior leadership, the report, titled Artificial Intelligence Localization, Winners, Losers, Heroes, Spectators, and You, found. Some 77% of those surveyed said they face barriers to entry from senior management not seeing value or wanting to make the investment in the emerging technology. These findings are in line with those from a recent Dimensional Research report, which found that eight out of 10 organizations engaged with AI and machine learning said those projects had stalled, and 96% said they have run into problems with data quality, data labelling, and building model confidence. Pactera presented the report to a group of tech industry leaders including those from Facebook, Adobe, Amazon, and Microsoft at a recent private event in Seattle.


MLPerf Training Benchmark

arXiv.org Machine Learning

Machine learning is experiencing an explosion of software and hardware solutions, and needs industry-standard performance benchmarks to drive design and enable competitive evaluation. However, machine learning training presents a number of unique challenges to benchmarking that do not exist in other domains: (1) some optimizations that improve training throughput actually increase time to solution, (2) training is stochastic and time to solution has high variance, and (3) the software and hardware systems are so diverse that they cannot be fairly benchmarked with the same binary, code, or even hyperparameters. We present MLPerf, a machine learning benchmark that overcomes these challenges. We quantitatively evaluate the efficacy of MLPerf in driving community progress on performance and scalability across two rounds of results from multiple vendors.


Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

arXiv.org Machine Learning

Neural sequence-to-sequence models, particularly the Transformer, are the state of the art in machine translation. Y et these neural networks are very sensitive to architecture and hyper-parameter settings. Optimizing these settings by grid or random search is computationally expensive because it requires many training runs. In this paper, we incorporate architecture search into a single training run through auto-sizing, which uses regularization to delete neurons in a network over the course of training. On very low-resource language pairs, we show that auto-sizing can improve BLEU scores by up to 3.9 points while removing one-third of the parameters from the model.