Goto

Collaborating Authors

 Reinforcement Learning


MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL)

#artificialintelligence

For more lecture videos on deep learning, reinforcement learning (RL), artificial intelligence (AI & AGI), and podcast conversations, visit our website or follow TensorFlow code tutorials on our GitHub repo.


Machine Learning Customizes Powered Knee Prosthetics for New Users in Minutes

#artificialintelligence

A new technique could reduce the time and discomfort of adjusting to a new prosthetic knee. A collaboration between researchers from North Carolina State University, the University of North Carolina and Arizona State University has resulted in a new technique that enables more rapid "tuning" of powered prosthetic knees, allowing patients to comfortably walk with a new prosthetic device in minutes, rather than hours after the device is first fitted After receiving the prosthetic knee, the device is tuned to tweak 12 different control parameters to accommodate the specific patient and address prosthesis dynamics like joint stiffness throughout the entire gait cycle. Traditionally, a practitioner works directly with the user to modify a handful of parameters in a process that could take several hours. However, by using a computer program that utilizes reinforcement learning--a type of machine learning--to modify all 12 parameters simultaneously, the new system allows patients to use their powered prosthetic knee to walk on a level surface after approximately 10 minutes of use. "We begin by giving a patient a powered prosthetic knee with a randomly selected set of parameters," Helen Huang, co-author of a paper on the work and a professor in the Joint Department of Biomedical Engineering at NC State and UNC, said in a statement.


Federated Reinforcement Learning

arXiv.org Artificial Intelligence

In reinforcement learning, building policies of high-quality is challenging when the feature space of states is small and the training data is limited. Directly transferring data or knowledge from an agent to another agent will not work due to the privacy requirement of data and models. In this paper, we propose a novel reinforcement learning approach to considering the privacy requirement and building Q-network for each agent with the help of other agents, namely federated reinforcement learning (FRL). To protect the privacy of data and models, we exploit Gausian differentials on the information shared with each other when updating their local models. In the experiment, we evaluate our FRL framework in two diverse domains, Grid-world and Text2Action domains, by comparing to various baselines.


Provably efficient RL with Rich Observations via Latent State Decoding

arXiv.org Machine Learning

We study the exploration problem in episodic MDPs with rich observations generated from a small number of latent states. Under certain identifiability assumptions, we demonstrate how to estimate a mapping from the observations to latent states inductively through a sequence of regression and clustering steps---where previously decoded latent states provide labels for later regression problems---and use it to construct good exploration policies. We provide finite-sample guarantees on the quality of the learned state decoding function and exploration policies, and complement our theory with an empirical evaluation on a class of hard exploration problems. Our method exponentially improves over $Q$-learning with na\"ive exploration, even when $Q$-learning has cheating access to latent states.


google/dopamine

#artificialintelligence

Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. It aims to fill the need for a small, easily grokked codebase in which users can freely experiment with wild ideas (speculative research). In the spirit of these principles, this first version focuses on supporting the state-of-the-art, single-GPU Rainbow agent (Hessel et al., 2018) applied to Atari 2600 game-playing (Bellemare et al., 2013). Specifically, our Rainbow agent implements the three components identified as most important by Hessel et al.: For completeness, we also provide an implementation of DQN (Mnih et al., 2015). For additional details, please see our documentation.


Never Forget: Balancing Exploration and Exploitation via Learning Optical Flow

arXiv.org Artificial Intelligence

Exploration bonus derived from the novelty of the states in an environment has become a popular approach to motivate exploration for deep reinforcement learning agents in the past few years. Recent methods such as curiosity-driven exploration usually estimate the novelty of new observations by the prediction errors of their system dynamics models. Due to the capacity limitation of the models and difficulty of performing next-frame prediction, however, these methods typically fail to balance between exploration and exploitation in high-dimensional observation tasks, resulting in the agents forgetting the visited paths and exploring those states repeatedly. Such inefficient exploration behavior causes significant performance drops, especially in large environments with sparse reward signals. In this paper, we propose to introduce the concept of optical flow estimation from the field of computer vision to deal with the above issue. We propose to employ optical flow estimation errors to examine the novelty of new observations, such that agents are able to memorize and understand the visited states in a more comprehensive fashion. We compare our method against the previous approaches in a number of experimental experiments. Our results indicate that the proposed method appears to deliver superior and long-lasting performance than the previous methods. We further provide a set of comprehensive ablative analysis of the proposed method, and investigate the impact of optical flow estimation on the learning curves of the DRL agents.


Model-based Deep Reinforcement Learning for Dynamic Portfolio Optimization

arXiv.org Machine Learning

Dynamic portfolio optimization is the process of sequentially allocating wealth to a collection of assets in some consecutive trading periods, based on investors' return-risk profile. Automating this process with machine learning remains a challenging problem. Here, we design a deep reinforcement learning (RL) architecture with an autonomous trading agent such that, investment decisions and actions are made periodically, based on a global objective, with autonomy. In particular, without relying on a purely model-free RL agent, we train our trading agent using a novel RL architecture consisting of an infused prediction module (IPM), a generative adversarial data augmentation module (DAM) and a behavior cloning module (BCM). Our model-based approach works with both on-policy or off-policy RL algorithms. We further design the back-testing and execution engine which interact with the RL agent in real time. Using historical {\em real} financial market data, we simulate trading with practical constraints, and demonstrate that our proposed model is robust, profitable and risk-sensitive, as compared to baseline trading strategies and model-free RL agents from prior work.


Dynamic Measurement Scheduling for Event Forecasting using Deep RL

arXiv.org Machine Learning

Current clinical practice for monitoring patients' health follows either regular or heuristic-based lab test (e.g. blood test) scheduling. Such practice not only gives rise to redundant measurements accruing cost, but may even cause unnecessary patient discomfort. From the computational perspective, heuristic-based test scheduling might lead to reduced accuracy of clinical forecasting models. A data-driven measurement scheduling is likely to lead to both more accurate predictions and less measurement costs. We address the scheduling problem using deep reinforcement learning (RL) and propose a general and scalable framework to achieve high predictive gain and low measurement cost, by scheduling fewer, but strategically timed tests. Using simulations we show that our policy outperforms heuristic-based measurement scheduling with higher predictive gain and lower cost. We then learn a scheduling policy for mortality forecasting in the real-world clinical dataset (MIMIC3). Our policy decreases the total number of measurements by 31% without reducing the predictive performance, or improves 3 times more predictive gain with the same number of measurements using off-policy policy evaluation.


Learn to Interpret Atari Agents

arXiv.org Machine Learning

Deep Reinforcement Learning (DeepRL) agents surpass human-level performances in a multitude of tasks. However, the direct mapping from states to actions makes it hard to interpret the rationale behind the decision making of agents. In contrast to previous a-posteriori methods of visualizing DeepRL policies, we propose an end-to-end trainable frameworkbased on Rainbow, a representative Deep Q-Network (DQN) agent. Our method automatically learns important regions in the input domain, which enables characterizations of the decision makingand interpretations for non-intuitive behaviors. Hence we name it Region Sensitive Rainbow (RS-Rainbow). RS-Rainbow utilizes a simple yet effective mechanism to incorporate visualization ability into the learning model, not only improving model interpretability, but leading to improved performance. Extensive experiments on the challenging platform of Atari 2600 demonstrate thesuperiority of RS-Rainbow. In particular, our agent achieves state of the art at just 25% of the training frames. Demonstrations and code are available at https://github.com/yz93/Learn-to-


Feudal Multi-Agent Hierarchies for Cooperative Reinforcement Learning

arXiv.org Artificial Intelligence

We investigate how reinforcement learning agents can learn to cooperate. Drawing inspiration from human societies, in which successful coordination of many individuals is often facilitated by hierarchical organisation, we introduce Feudal Multi-agent Hierarchies (FMH). In this framework, a 'manager' agent, which is tasked with maximising the environmentally-determined reward function, learns to communicate subgoals to multiple, simultaneously-operating, 'worker' agents. Workers, which are rewarded for achieving managerial subgoals, take concurrent actions in the world. We outline the structure of FMH and demonstrate its potential for decentralised learning and control. We find that, given an adequate set of subgoals from which to choose, FMH performs, and particularly scales, substantially better than cooperative approaches that use a shared reward function.