Reinforcement Learning
Towards Better Opioid Antagonists Using Deep Reinforcement Learning
Deng, Jianyuan, Yang, Zhibo, Li, Yao, Samaras, Dimitris, Wang, Fusheng
Naloxone, an opioid antagonist, has been widely used to save lives from opioid overdose, a leading cause for death in the opioid epidemic. However, naloxone has short brain retention ability, which limits its therapeutic efficacy. Developing better opioid antagonists is critical in combating the opioid epidemic.Instead of exhaustively searching in a huge chemical space for better opioid antagonists, we adopt reinforcement learning which allows efficient gradient-based search towards molecules with desired physicochemical and/or biological properties. Specifically, we implement a deep reinforcement learning framework to discover potential lead compounds as better opioid antagonists with enhanced brain retention ability. A customized multi-objective reward function is designed to bias the generation towards molecules with both sufficient opioid antagonistic effect and enhanced brain retention ability. Thorough evaluation demonstrates that with this framework, we are able to identify valid, novel and feasible molecules with multiple desired properties, which has high potential in drug discovery.
AirRL: A Reinforcement Learning Approach to Urban Air Quality Inference
Zhong, Huiqiang, Yin, Cunxiang, Wu, Xiaohui, Luo, Jinchang, He, JiaWei
Urban air pollution has become a major environmental problem that threatens public health. It has become increasingly important to infer fine-grained urban air quality based on existing monitoring stations. One of the challenges is how to effectively select some relevant stations for air quality inference. In this paper, we propose a novel model based on reinforcement learning for urban air quality inference. The model consists of two modules: a station selector and an air quality regressor. The station selector dynamically selects the most relevant monitoring stations when inferring air quality. The air quality regressor takes in the selected stations and makes air quality inference with deep neural network. We conduct experiments on a real-world air quality dataset and our approach achieves the highest performance compared with several popular solutions, and the experiments show significant effectiveness of proposed model in tackling problems of air quality inference.
Convergence of Recursive Stochastic Algorithms using Wasserstein Divergence
Gupta, Abhishek, Haskell, William B.
This paper develops a unified framework, based on iterated random operator theory, to analyze the convergence of constant stepsize recursive stochastic algorithms (RSAs) in machine learning and reinforcement learning. RSAs use randomization to efficiently compute expectations, and so their iterates form a stochastic process. The key idea is to lift the RSA into an appropriate higher-dimensional space and then express it as an equivalent Markov chain. Instead of determining the convergence of this Markov chain (which may not converge under constant stepsize), we study the convergence of the distribution of this Markov chain. To study this, we define a new notion of Wasserstein divergence. We show that if the distribution of the iterates in the Markov chain satisfy certain contraction property with respect to the Wasserstein divergence, then the Markov chain admits an invariant distribution. Inspired by the SVRG algorithm, we develop a method to convert any RSA to a variance reduced RSA that converges to the optimal solution with in almost sure sense or in probability. We show that convergence of a large family of constant stepsize RSAs can be understood using this framework. We apply this framework to ascertain the convergence of mini-batch SGD, forward-backward splitting with catalyst, SVRG, SAGA, empirical Q value iteration, synchronous Q-learning, enhanced policy iteration, and MDPs with a generative model. We also develop two new algorithms for reinforcement learning and establish their convergence using this framework.
Adaptive Conditional Neural Movement Primitives via Representation Sharing Between Supervised and Reinforcement Learning
Akbulut, M. Tuluhan, Seker, M. Yunus, Tekden, Ahmet E., Nagai, Yukie, Oztop, Erhan, Ugur, Emre
Learning by Demonstration provides a sample efficient way to equip robots with complex sensorimotor skills in supervised manner. Several movement primitive representations can be used for flexible motor representation and learning. A recent state-of-the art approach is Conditional Neural Movement Primitives (CNMP) that can learn non-linear relations between environment parameters and complex multi-modal trajectories from a few expert demonstrations by forming powerful latent space representations. In this study, to improve the applicability of CNMP to changing tasks and/or environments, we couple it with a reinforcement learning agent that exploits the formed representations by the original CNMP network, and learns to generate synthetic demonstrations for further learning. This enables the CNMP network to generalize to new environments by adapting its internal representations. In the current implementation, the reinforcement learning agent is triggered when a failure in task execution is detected, and the CNMP is trained with the newly discovered demonstration (trajectory), which shares essential characteristics with the original demonstrations due to the representation sharing. As a result, the overall system increases its capacity and handle situations in scenarios where the initial CNMP network can not produce a useful trajectory. To show the validity of our proposed model, we compare our approach with original CNMP work and other movement primitives approaches. Furthermore, we presents the experimental results from the implementation of the proposed model on real robotics setups, which indicate the applicability of our approach as an effective adaptive learning by demonstration system.
The 10 Best Free Artificial Intelligence And Machine Learning Courses for 2020
The demand for people with knowledge and skills in artificial intelligence (AI) and machine learning (ML) hugely outstrips the supply. This means that learning and gaining qualifications in these subjects can be a great way to enhance your career prospects. However, not everyone has the spare time and money to spend years studying for a degree or other formal qualifications. Today, with the wealth of freely available educational content online, it may not be necessary. There are so many courses, tutorials, and guides available online that it is perfectly possible to gain a thorough grounding in these subjects without paying a penny.
Reinforcement Learning: The Algorithms Changing How Computers Make Decisions
The last decade of tech was to a large part defined by the advent of Deep Supervised Learning (DL). The availability of cheap data at scale, computational power, and researcher interest have made it the de-facto school of algorithms used for most pattern recognition problems. Face recognition on social media, product recommendations on sites, voice assistants like Google Assistant, Alexa, and Siri are some examples largely powered by DL. The issue with deep learning is that the resources that led to its rise are also giving rise to inequities. Today, it is tough for startups to beat'big tech' like Apple, Google, Amazon, and Microsoft in deep learning through better research capabilities or better data.
Distributional Reinforcement Learning with Ensembles
Lindenberg, Bjรถrn, Nordqvist, Jonas, Lindahl, Karl-Olof
It is well-known that ensemble methods often provide enhanced performance in reinforcement learning. In this paper we explore this concept further by using group-aided training within the distributional reinforcement learning paradigm. Specifically, we propose an extension to categorical reinforcement learning, where distributional learning targets are implicitly based on the total information gathered by an ensemble. We empirically show that this may lead to much more robust initial learning, a stronger individual performance level and good efficiency on a per-sample basis.
Fiber: A Platform for Efficient Development and Distributed Training for Reinforcement Learning and Population-Based Methods
Zhi, Jiale, Wang, Rui, Clune, Jeff, Stanley, Kenneth O.
Recent advances in machine learning are consistently enabled by increasing amounts of computation. Reinforcement learning (RL) and population-based methods in particular pose unique challenges for efficiency and flexibility to the underlying distributed computing frameworks. These challenges include frequent interaction with simulations, the need for dynamic scaling, and the need for a user interface with low adoption cost and consistency across different backends. In this paper we address these challenges while still retaining development efficiency and flexibility for both research and practical applications by introducing Fiber, a scalable distributed computing framework for RL and population-based methods. Fiber aims to significantly expand the accessibility of large-scale parallel computation to users of otherwise complicated RL and population-based approaches without the need to for specialized computational expertise.
An empirical investigation of the challenges of real-world reinforcement learning
Dulac-Arnold, Gabriel, Levine, Nir, Mankowitz, Daniel J., Li, Jerry, Paduraru, Cosmin, Gowal, Sven, Hester, Todd
Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is beginning to show some successes in real-world scenarios. However, much of the research advances in RL are hard to leverage in real-world systems due to a series of assumptions that are rarely satisfied in practice. In this work, we identify and formalize a series of independent challenges that embody the difficulties that must be addressed for RL to be commonly deployed in real-world systems. For each challenge, we define it formally in the context of a Markov Decision Process, analyze the effects of the challenge on state-of-the-art learning algorithms, and present some existing attempts at tackling it. We believe that an approach that addresses our set of proposed challenges would be readily deployable in a large number of real world problems. Our proposed challenges are implemented in a suite of continuous control environments called realworldrl-suite which we propose an as an open-source benchmark.
Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning
Mousavi, Ali, Li, Lihong, Liu, Qiang, Zhou, Denny
Off-policy estimation for long-horizon problems is important in many real-life applications such as healthcare and robotics, where high-fidelity simulators may not be available and on-policy evaluation is expensive or impossible. Recently, \cite{liu18breaking} proposed an approach that avoids the \emph{curse of horizon} suffered by typical importance-sampling-based methods. While showing promising results, this approach is limited in practice as it requires data be drawn from the \emph{stationary distribution} of a \emph{known} behavior policy. In this work, we propose a novel approach that eliminates such limitations. In particular, we formulate the problem as solving for the fixed point of a certain operator. Using tools from Reproducing Kernel Hilbert Spaces (RKHSs), we develop a new estimator that computes importance ratios of stationary distributions, without knowledge of how the off-policy data are collected. We analyze its asymptotic consistency and finite-sample generalization. Experiments on benchmarks verify the effectiveness of our approach.