Machine Translation
Stand-Alone Self-Attention in Vision Models
Ramachandran, Prajit, Parmar, Niki, Vaswani, Ashish, Bello, Irwan, Levskaya, Anselm, Shlens, Jon
Convolutions are a fundamental building block of modern computer vision systems. Recent approaches have argued for going beyond convolutions in order to capture long-range dependencies. These efforts focus on augmenting convolutional models with content-based interactions, such as self-attention and non-local means, to achieve gains on a number of vision tasks. The natural question that arises is whether attention can be a stand-alone primitive for vision models instead of serving as just an augmentation on top of convolutions. In developing and testing a pure self-attention vision model, we verify that self-attention can indeed be an effective stand-alone layer. A simple procedure of replacing all instances of spatial convolutions with a form of self-attention to ResNet-50 produces a fully self-attentional model that outperforms the baseline on ImageNet classification with 12% fewer FLOPS and 29% fewer parameters. On COCO object detection, a fully self-attention model matches the mAP of a baseline RetinaNet while having 39% fewer FLOPS and 34% fewer parameters. Detailed ablation studies demonstrate that self-attention is especially impactful when used in later layers. These results establish that stand-alone self-attention is an important addition to the vision practitioner's toolbox.
Learning from Learning Machines: Optimisation, Rules, and Social Norms
LaCroix, Travis, Bengio, Yoshua
There is an analogy between machine learning systems and economic entities in that they are both adaptive, and their behaviour is specified in a more-or-less explicit way. It appears that the area of AI that is most analogous to the behaviour of economic entities is that of morally good decision-making, but it is an open question as to how precisely moral behaviour can be achieved in an AI system. This paper explores the analogy between these two complex systems, and we suggest that a clearer understanding of this apparent analogy may help us forward in both the socio-economic domain and the AI domain: known results in economics may help inform feasible solutions in AI safety, but also known results in AI may inform economic policy. If this claim is correct, then the recent successes of deep learning for AI suggest that more implicit specifications work better than explicit ones for solving such problems.
Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation
Song, Haiyue, Dabre, Raj, Fujita, Atsushi, Kurohashi, Sadao
Lectures translation is a case of spoken language translation and there is a lack of publicly available parallel corpora for this purpose. To address this, we examine a language independent framework for parallel corpus mining which is a quick and effective way to mine a parallel corpus from publicly available lectures at Coursera. Our approach determines sentence alignments, relying on machine translation and cosine similarity over continuous-space sentence representations. We also show how to use the resulting corpora in a multistage fine-tuning based domain adaptation for high-quality lectures translation. For Japanese--English lectures translation, we extracted parallel data of approximately 40,000 lines and created development and test sets through manual filtering for benchmarking translation performance. We demonstrate that the mined corpus greatly enhances the quality of translation when used in conjunction with out-of-domain parallel corpora via multistage training. This paper also suggests some guidelines to gather and clean corpora, mine parallel sentences, address noise in the mined data, and create high-quality evaluation splits. For the sake of reproducibility, we will release our code for parallel data creation.
Artificial intelligence jobs on the rise, along with everything else AI ZDNet
AI jobs are on the upswing, as are the capabilities of AI systems. The speed of deployments has also increased exponentially. It's now possible to train an image-processing algorithm in about a minute -- something that took hours just a couple of years ago. These are among the key metrics of AI tracked in the latest release of the AI Index, an annual data update from Stanford University's Human-Centered Artificial Intelligence Institute published in partnership with McKinsey Global Institute. The index tracks AI growth across a range of metrics, from papers published to patents granted to employment numbers.
Tag-less Back-Translation
Abdulmumin, Idris, Galadanci, Bashir Shehu, Garba, Aliyu
An effective method to generate a large number of parallel sentences for training improved neural machine translation (NMT) systems is the use of back-translations of the target-side monolingual data. Tagging, or using gates, has been used to enable translation models to distinguish between synthetic and natural data. This improves standard back-translation and also enables the use of iterative back-translation on language pairs that underperformed using standard back-translation. This work presents a simplified approach of differentiating between the two data using pretraining and finetuning. The approach - tag-less back-translation - trains the model on the synthetic data and finetunes it on the natural data. Preliminary experiments have shown the approach to continuously outperform the tagging approach on low resource English-Vietnamese neural machine translation. While the need for tagging (noising) the dataset has been removed, the approach outperformed the tagged back-translation approach by an average of 0.4 BLEU.
The 10 Algorithms Data Scientist must have to Know.
Let's say I am given an Excel sheet with data about various fruits and I have to tell which look like Apples. What I will do is ask a question "Which fruits are red and round?" and divide all fruits which answer yes and no to the question. Now, All Red and Round fruits might not be apples and all apples won't be red and round. So I will ask a question "Which fruits have red or yellow color hints on them? " on red and round fruits and will ask "Which fruits are green and round?" on not red and round fruits. Based on these questions I can tell with considerable accuracy which are apples. This cascade of questions is what a decision tree is. However, this is a decision tree based on my intuition.
Techniques for Interpretable Machine Learning
Machine learning is progressing at an astounding rate, powered by complex models such as ensemble models and deep neural networks (DNNs). These models have a wide range of real-world applications, such as movie recommendations of Netflix, neural machine translation of Google, and speech recognition of Amazon Alexa. Despite the successes, machine learning has its own limitations and drawbacks. The most significant one is the lack of transparency behind their behaviors, which leaves users with little understanding of how particular decisions are made by these models. Consider, for instance, an advanced self-driving car equipped with various machine learning algorithms does not brake or decelerate when confronting a stopped firetruck. This unexpected behavior may frustrate and confuse users, making them wonder why. Even worse, the wrong decisions could cause severe consequences if the car is driving at highway speeds and might ultimately crash into the firetruck. The concerns about the black-box nature of complex models have hampered their further applications in our society, especially in those critical decision-making domains like self-driving cars. Interpretable machine learning would be an effective tool to mitigate these problems. It gives machine learning models the ability to explain or to present their behaviors in understandable terms to humans,10 which is called interpretability or explainability and we use them interchangeably in this article. Interpretability would be an indispensable part for machine learning models in order to better serve human beings and bring benefits to society. For end users, explanation will increase their trust and encourage them to adopt machine learning systems. From the perspective of machine learning system developers and researchers, the provided explanation can help them better understand the problem, the data and why a model might fail, and eventually increase the system safety. Thus, there is a growing interest among the academic and industrial community in interpreting machine learning models and gaining insights into their working mechanisms.
When machine learning packs an economic punch
A new study co-authored by an MIT economist shows that improved translation software can significantly boost international trade online -- a notable case of machine learning having a clear impact on economic activity. The research finds that after eBay improved its automatic translation program in 2014, commerce shot up by 10.9 percent among pairs of countries where people could use the new system. To have it be so clear in such a short amount of time really says a lot about the power of this technology," says Erik Brynjolfsson, an MIT economist and co-author of a new paper detailing the results. To put the results in perspective, he adds, consider that physical distance is, by itself, also a significant barrier to global commerce. The 10.9 percent change generated by eBay's new translation software increases trade by the same amount as "making the world 26 percent smaller, in terms of its impact on the goods that we studied," he says. The paper, "Does Machine Translation Affect International Trade?
Black Box Recursive Translations for Molecular Optimization
Damani, Farhan, Sresht, Vishnu, Ra, Stephen
Machine learning algorithms for generating molecular structures offer a promising new approach to drug discovery. We cast molecular optimization as a translation problem, where the goal is to map an input compound to a target compound with improved biochemical properties. Remarkably, we observe that when generated molecules are iteratively fed back into the translator, molecular compound attributes improve with each step. We show that this finding is invariant to the choice of translation model, making this a "black box" algorithm. We call this method Black Box Recursive Translation (BBRT), a new inference method for molecular property optimization. This simple, powerful technique operates strictly on the inputs and outputs of any translation model. We obtain new state-of-the-art results for molecular property optimization tasks using our simple drop-in replacement with well-known sequence and graph-based models. Our method provides a significant boost in performance relative to its non-recursive peers with just a simple "for" loop. Further, BBRT is highly interpretable, allowing users to map the evolution of newly discovered compounds from known starting points.
Cross-Lingual Ability of Multilingual BERT: An Empirical Study
K, Karthikeyan, Wang, Zihan, Mayhew, Stephen, Roth, Dan
Recent work has exhibited the surprising cross-lingual abi lities of multilingual BERT ( M-BERT) - surprising since it is trained without any cross-lingual objective and with no aligned data. In this work, we provide a compr ehensive study of the contribution of different components in M-BERT to its cross-lingual ability. The experimental study is done in the context of three typologically different languages - Spani sh, Hindi, and Russian - and using two conceptually different NLP tasks, textual en tailment and named entity recognition. Among our key conclusions is the fact th at the lexical overlap between languages plays a negligible role in the cross-ling ual success, while the depth of the network is an integral part of it. Embeddings of natural language text via unsupervised learn ing, coupled with sufficient supervised training data, have been ubiquitous in NLP in recent years an d have shown success in a wide range of monolingual NLP tasks, mostly in English. Training models f or other languages have been shown more difficult, and recent approaches relied on bilingual em beddings that allowed the transfer of supervision in high resource languages like English to mode ls in lower resource languages; however, inducing these bilingual embeddings required some level of supervision (Upadhyay et al., 2016). Not only the model is contextual, but its training also requires no supervisio n - no alignment between the languages is done. Nevertheless, and despite being trained with no exp licit cross-lingual objective, M-BERT produces a representation that seems to generalize well acr oss languages for a variety of downstream tasks (Wu & Dredze, 2019). In this work, we attempt to develop an understanding of the su ccess of M-BERT.