machine translation


What's Next For AI: Solving Advanced Math Equations

#artificialintelligence

Any high school student would guess there is a cosine involved when they see an integral of a sine. Regardless of whether the person understands the thought process behind these functions, it does the job for them. This intuition behind calculus is rarely explored. Though Newton and Leibnitz developed advanced mathematics to solve real-world problems, today most of the schools teach differential equations through semantics. The linguistic appeal of mathematics might get grades in high school, but in the world of research, this is hysterical.


Deep Learning (Interview With Jürgen Schmidhuber)

#artificialintelligence

Other ex-PhD students of mine joined DeepMind later, including the creator of DeepMind's "Neural Turing Machine," and a co-author of our paper on Atari-Go in 2010.


Using neural networks to solve advanced mathematics equations

#artificialintelligence

Facebook AI has built the first AI system that can solve advanced mathematics equations using symbolic reasoning. By developing a new way to represent complex mathematical expressions as a kind of language and then treating solutions as a translation problem for sequence-to-sequence neural networks, we built a system that outperforms traditional computation systems at solving integration problems and both first- and second-order differential equations. Previously, these kinds of problems were considered out of the reach of deep learning models, because solving complex equations requires precision rather than approximation. Neural networks excel at learning to succeed through approximation, such as recognizing that a particular pattern of pixels is likely to be an image of a dog or that features of a sentence in one language match those in another. Solving complex equations also requires the ability to work with symbolic data, such as the letters in the formula b - 4ac 7.


r/MachineLearning - [D] [Machine Translation] Sources for the use of monolingual data in order to improve situations with already sufficient parallel data

#artificialintelligence

Does anyone know of scientific literature that shows that, even in cases in which we have enough parallel data (English-French), use of monolingual data can be beneficial? To me it seems reasonable that if we, for instance, added monolingual data to the decoder, it would be better at scoring candidate predictions in terms of fluency. That being said, I cannot find peer-reviewed articles that show this.


History of the first AI Winter

#artificialintelligence

AI has a long history. One can argue it even started long before the term was first coined; mostly in stories and later in actual mechanical devices called automata. This chapter only covers events relevant to the periods of AI winters without being too exhaustive in hope to extract knowledge that can be applied today. To aid understanding the phenomenon of AI Winters, the events leading up to them are examined. Many early ideas about thinking machines appeared in the late 1940s to '50s by people like Turing or Von Neumann.


Artificial intelligence: The good, the bad and the ugly

#artificialintelligence

Welcome to TechTalks' AI book reviews, a series of posts that explore the latest literature on AI. It wouldn't be an overstatement to say that artificial intelligence is one of the most confusing and least understood fields of science. On the one hand, we have headlines that warn of deep learning outperforming medical experts, creating their own language and spinning fake news stories. On the other hand, AI experts point out that artificial neural networks, the key innovation of current AI techniques, fail at some of the most basic tasks that any human child can perform. Artificial intelligence is also marked with some of the most divisive disputes and rivalries.


Learning from Multiple Partially Observed Views - an Application to Multilingual Text Categorization

Neural Information Processing Systems

We address the problem of learning classifiers when observations have multiple views, some of which may not be observed for all examples. We assume the existence of view generating functions which may complete the missing views in an approximate way. This situation corresponds for example to learning text classifiers from multilingual collections where documents are not available in all languages. In that case, Machine Translation (MT) systems may be used to translate each document in the missing languages. We derive a generalization error bound for classifiers learned on examples with multiple artificially created views.


Levenshtein Transformer

Neural Information Processing Systems

Modern neural sequence generation models are built to either generate tokens step-by-step from scratch or (iteratively) modify a sequence of tokens bounded by a fixed length. In this work, we develop Levenshtein Transformer, a new partially autoregressive model devised for more flexible and amenable sequence generation. Unlike previous approaches, the basic operations of our model are insertion and deletion. We also propose a set of new training techniques dedicated at them, effectively exploiting one as the other's learning signal thanks to their complementary nature. Experiments applying the proposed model achieve comparable or even better performance with much-improved efficiency on both generation (e.g. machine translation, text summarization) and refinement tasks (e.g.


Lookahead Optimizer: k steps forward, 1 step back

Neural Information Processing Systems

The vast majority of successful deep neural networks are trained using variants of stochastic gradient descent (SGD) algorithms. Recent attempts to improve SGD can be broadly categorized into two approaches: (1) adaptive learning rate schemes, such as AdaGrad and Adam and (2) accelerated schemes, such as heavy-ball and Nesterov momentum. In this paper, we propose a new optimization algorithm, Lookahead, that is orthogonal to these previous approaches and iteratively updates two sets of weights. Intuitively, the algorithm chooses a search direction by looking ahead at the sequence of fast weights" generated by another optimizer. We show that Lookahead improves the learning stability and lowers the variance of its inner optimizer with negligible computation and memory cost.


Cross-lingual Language Model Pretraining

Neural Information Processing Systems

Recent studies have demonstrated the efficiency of generative pretraining for English natural language understanding. In this work, we extend this approach to multiple languages and show the effectiveness of cross-lingual pretraining. We propose two methods to learn cross-lingual language models (XLMs): one unsupervised that only relies on monolingual data, and one supervised that leverages parallel data with a new cross-lingual language model objective. We obtain state-of-the-art results on cross-lingual classification, unsupervised and supervised machine translation. On XNLI, our approach pushes the state of the art by an absolute gain of 4.9% accuracy.