Reinforcement Learning
A breakthrough in imaginative AI with experimental validation to accelerate drug discovery
The many advances in deep learning reinforcement learning and generative adversarial learning made since 2014 are rapidly transforming multiple industries including search, translation, video games, retail, transportation, and many others. It is relatively easy to validate the performance of the AI systems in imaging, voice, text and other areas where human sensory systems can be used to rapidly verify the validity of the experimental results. However, in the pharmaceutical industry, the validation cycles take decades and cost billions of dollars. Most of the common questions asked by the pharmaceutical industry executives to all of the leading artificial intelligence groups worldwide deal with the novelty of the algorithms and experimental validation of the results in mice or even in humans. There is a grave disconnect between the leaders in AI focusing on the novelty of the algorithms and drug discovery and development experts focusing only on experimental data.
Prediction, Consistency, Curvature: Representation Learning for Locally-Linear Control
Levine, Nir, Chow, Yinlam, Shu, Rui, Li, Ang, Ghavamzadeh, Mohammad, Bui, Hung
Many real-world sequential decision-making problems can be formulated as optimal control with high-dimensional observations and unknown dynamics. A promising approach is to embed the high-dimensional observations into a lower-dimensional latent representation space, estimate the latent dynamics model, then utilize this model for control in the latent space. An important open question is how to learn a representation that is amenable to existing control algorithms? In this paper, we focus on learning representations for locally-linear control algorithms, such as iterative LQR (iLQR). By formulating and analyzing the representation learning problem from an optimal control perspective, we establish three underlying principles that the learned representation should comprise: 1) accurate prediction in the observation space, 2) consistency between latent and observation space dynamics, and 3) low curvature in the latent space transitions. These principles naturally correspond to a loss function that consists of three terms: prediction, consistency, and curvature (PCC). Crucially, to make PCC tractable, we derive an amortized variational bound for the PCC loss function. Extensive experiments on benchmark domains demonstrate that the new variational-PCC learning algorithm benefits from significantly more stable and reproducible training, and leads to superior control performance. Further ablation studies give support to the importance of all three PCC components for learning a good latent space for control.
Making Efficient Use of Demonstrations to Solve Hard Exploration Problems
Paine, Tom Le, Gulcehre, Caglar, Shahriari, Bobak, Denil, Misha, Hoffman, Matt, Soyer, Hubert, Tanburn, Richard, Kapturowski, Steven, Rabinowitz, Neil, Williams, Duncan, Barth-Maron, Gabriel, Wang, Ziyu, de Freitas, Nando, Team, Worlds
This paper introduces R2D3, an agent that makes efficient use of demonstrations to solve hard exploration problems in partially observable environments with highly variable initial conditions. We also introduce a suite of eight tasks that combine these three properties, and show that R2D3 can solve several of the tasks where other state of the art methods (both with and without demonstrations) fail to see even a single successful trajectory after tens of billions of steps of exploration.
Generalization in Transfer Learning
Ada, Suzan Ece, Ugur, Emre, Akin, H. Levent
Agents trained with deep reinforcement learning algorithms are capable of performing highly complex tasks including locomotion in continuous environments. In order to attain a human-level performance, the next step of research should be to investigate the ability to transfer the learning acquired in one task to a different set of tasks. Concerns on generalization and overfitting in deep reinforcement learning are not usually addressed in current transfer learning research. This issue results in underperforming benchmarks and inaccurate algorithm comparisons due to rudimentary assessments. In this study, we primarily propose regularization techniques in deep reinforcement learning for continuous control through the application of sample elimination and early stopping. First, the importance of the inclusion of training iteration to the hyperparameters in deep transfer learning problems will be emphasized. Because source task performance is not indicative of the generalization capacity of the algorithm, we start by proposing various transfer learning evaluation methods that acknowledge the training iteration as a hyperparameter. In line with this, we introduce an additional step of resorting to earlier snapshots of policy parameters depending on the target task due to overfitting to the source task. Then, in order to generate robust policies,we discard the samples that lead to overfitting via strict clipping. Furthermore, we increase the generalization capacity in widely used transfer learning benchmarks by using entropy bonus, different critic methods and curriculum learning in an adversarial setup. Finally, we evaluate the robustness of these techniques and algorithms on simulated robots in target environments where the morphology of the robot, gravity and tangential friction of the environment are altered from the source environment.
Reinforcement Learning -- The Third Paradigm of Machine Learning: MLmuse
In this era of automation, we hear so often about autonomous cars or robots outperforming people or, a computer game defeating the best of the champions! And yes, these are some of the most beautiful and indeed complex applications of Deep Reinforcement Learning. The aim of this article is to present the concept of reinforcement learning. The literal meaning of the word'reinforce' is'to strengthen'. Reinforcement in psychology is to establish or encourage a pattern of behavior by providing a stimulus that can increase the probability of the desired behavior.
Reinforcement learning and deep learning pairing pushes AI limits
Reinforcement learning and deep learning arose as separate disciplines within AI, but researchers are increasingly finding that pairing the two can deliver promising applications. Deep learning has excelled at tasks like training classifiers for image and speech recognition. Reinforcement learning techniques have excelled at creating AI systems that improve through trial and error to produce game-playing bots and recommendation engines. At the Re•Work Deep Reinforcement Learning Summit in San Francisco, researchers explored how the two approaches are being combined to craft more automated and optimized reinforcement learning algorithms. "In the last six years, we've been really focusing on getting this combination of deep networks and reinforcement learning to be more stable, more reliable, more predictable," said Marc Bellemare, research scientist at Google Brain, in an interview.
Reinforcing Medical Image Classifier to Improve Generalization on Small Datasets
Al, Walid Abdullah, Yun, Il Dong
With the advents of deep learning, improved image classification with complex discriminative models has been made possible. However, such deep models with increased complexity require a huge set of labeled samples to generalize the training. Such classification models can easily overfit when applied for medical images because of limited training data, which is a common problem in the field of medical image analysis. This paper proposes and investigates a reinforced classifier for improving the generalization under a few available training data. Partially following the idea of reinforcement learning, the proposed classifier uses a generalization-feedback from a subset of the training data to update its parameter instead of only using the conventional cross-entropy loss about the training data. We evaluate the improvement of the proposed classifier by applying it on three different classification problems against the standard deep classifiers equipped with existing overfitting-prevention techniques. Besides an overall improvement in classification performance, the proposed classifier showed remarkable characteristics of generalized learning, which can have great potential in medical classification tasks.
Machine Learning and Reinforcement Learning in Finance Coursera
The main goal of this specialization is to provide the knowledge and practical skills necessary to develop a strong foundation on core paradigms and algorithms of machine learning (ML), with a particular focus on applications of ML to various practical problems in Finance. The specialization aims at helping students to be able to solve practical ML-amenable problems that they may encounter in real life that include: (1) mapping the problem on a general landscape of available ML methods, (2) choosing particular ML approach(es) that would be most appropriate for resolving the problem, and (3) successfully implementing a solution, and assessing its performance. The specialization is designed for three categories of students: · Practitioners working at financial institutions such as banks, asset management firms or hedge funds · Individuals interested in applications of ML for personal day trading · Current full-time students pursuing a degree in Finance, Statistics, Computer Science, Mathematics, Physics, Engineering or other related disciplines who want to learn about practical applications of ML in Finance. The modules can also be taken individually to improve relevant skills in a particular area of applications of ML to finance.