Reinforcement Learning
lexfridman/mit-deep-learning
This repository is a collection of tutorials for MIT Deep Learning courses. More added as courses progress. DeepTraffic is a deep reinforcement learning competition. The goal is to create a neural network that drives a vehicle (or multiple vehicles) as fast as possible through dense highway traffic.
Large-Scale Study of Curiosity-Driven Learning
Reinforcement learning algorithms rely on carefully engineering environment rewards that are extrinsic to the agent. However, annotating each environment with hand-designed, dense rewards is not scalable, motivating the need for developing reward functions that are intrinsic to the agent. Curiosity is a type of intrinsic reward function which uses prediction error as reward signal. In this paper: (a) We perform the first large-scale study of purely curiosity-driven learning, i.e. without any extrinsic rewards, across 54 standard benchmark environments, including the Atari game suite. Our results show surprisingly good performance, and a high degree of alignment between the intrinsic curiosity objective and the hand-designed extrinsic rewards of many game environments.
Balancing Two-Player Stochastic Games with Soft Q-Learning
Grau-Moya, Jordi, Leibfried, Felix, Bou-Ammar, Haitham
Within the context of video games the notion of perfectly rational agents can be undesirable as it leads to uninteresting situations, where humans face tough adversarial decision makers. Current frameworks for stochastic games and reinforcement learning prohibit tuneable strategies as they seek optimal performance. In this paper, we enable such tuneable behaviour by generalising soft Q-learning to stochastic games, where more than one agent interact strategically. We contribute both theoretically and empirically. On the theory side, we show that games with soft Q-learning exhibit a unique value and generalise team games and zero-sum games far beyond these two extremes to cover a continuous spectrum of gaming behaviour. Experimentally, we show how tuning agents' constraints affect performance and demonstrate, through a neural network architecture, how to reliably balance games with high-dimensional representations.
Uncertainty-Based Out-of-Distribution Detection in Deep Reinforcement Learning
Sedlmeier, Andreas, Gabor, Thomas, Phan, Thomy, Belzner, Lenz, Linnhoff-Popien, Claudia
We consider the problem of detecting out-of-distribution (OOD) samples in deep reinforcement learning. In a value based reinforcement learning setting, we propose to use uncertainty estimation techniques directly on the agent's value estimating neural network to detect OOD samples. The focus of our work lies in analyzing the suitability of approximate Bayesian inference methods and related ensembling techniques that generate uncertainty estimates. Although prior work has shown that dropout-based variational inference techniques and bootstrap-based approaches can be used to model epistemic uncertainty, the suitability for detecting OOD samples in deep reinforcement learning remains an open question. Our results show that uncertainty estimation can be used to differentiate in- from out-of-distribution samples. Over the complete training process of the reinforcement learning agents, bootstrap-based approaches tend to produce more reliable epistemic uncertainty estimates, when compared to dropout-based approaches.
Risk-Aware Active Inverse Reinforcement Learning
Brown, Daniel S., Cui, Yuchen, Niekum, Scott
Active learning from demonstration allows a robot to query a human for specific types of input to achieve efficient learning. Existing work has explored a variety of active query strategies; however, to our knowledge, none of these strategies directly minimize the performance risk of the policy the robot is learning. Utilizing recent advances in performance bounds for inverse reinforcement learning, we propose a risk-aware active inverse reinforcement learning algorithm that focuses active queries on areas of the state space with the potential for large generalization error. We show that risk-aware active learning outperforms standard active IRL approaches on gridworld, simulated driving, and table setting tasks, while also providing a performance-based stopping criterion that allows a robot to know when it has received enough demonstrations to safely perform a task.
Move 37 Explained
Why was AlphaGo's Move 37 against Lee Sedol so significant? Why was it so important that I named my 10 week course on deep reinforcement learning on it? In this final video of my course, I'll explain what move 37 symbolized for humanity and detail 3 examples of how it will affect healthcare, design, and decision-making. We'll go through a code example of a Generative Adversarial Network and even discuss China ambitious 2030 AI initiative. Theres a lot that I cover in this video, I hope that it helps connect the dots.
Credit Assignment Techniques in Stochastic Computation Graphs
Weber, Thรฉophane, Heess, Nicolas, Buesing, Lars, Silver, David
Stochastic computation graphs (SCGs) provide a formalism to represent structured optimization problems arising in artificial intelligence, including supervised, unsupervised, and reinforcement learning. Previous work has shown that an unbiased estimator of the gradient of the expected loss of SCGs can be derived from a single principle. However, this estimator often has high variance and requires a full model evaluation per data point, making this algorithm costly in large graphs. In this work, we address these problems by generalizing concepts from the reinforcement learning literature. We introduce the concepts of value functions, baselines and critics for arbitrary SCGs, and show how to use them to derive lower-variance gradient estimates from partial model evaluations, paving the way towards general and efficient credit assignment for gradient-based optimization. In doing so, we demonstrate how our results unify recent advances in the probabilistic inference and reinforcement learning literature.
A dual mode adaptive basal-bolus advisor based on reinforcement learning
Sun, Qingnan, Jankovic, Marko V., Budzinski, Joรฃo, Moore, Brett, Diem, Peter, Stettler, Christoph, Mougiakakou, Stavroula G.
-- Self - monitoring of blood glucose (SMBG) and continuous glucose monitoring (CGM) are commonly used by type 1 diabetes (T1D) patients to measure glucose concentrations. The proposed adaptive basal - bolus algori thm (ABBA) supports inputs from either SMBG or CGM devices to provide personalised suggestions for the daily basal rate and prandial insulin doses on the basis of the patients' glucose level on the previous day. The ABBA is based on reinforcement learning (RL), a type of artificial intelligence, and was validated in silico with an FDA - accepted population of 100 adults under different realistic scenarios lasting three simulated months. The scenarios involve three main meals and one bedtime snack per day, alo ng with different variabilities and uncertainties for insulin sensitivity, mealtime, carbohydrate amount, and glucose measurement time. The results indicate that the proposed approach achieves comparable performance with CGM or SMBG as input signals, witho ut influencing the total daily insulin dose. The results are a promising indication that AI algorithmic approaches can provide personalised adaptive insulin optimisation and achieve glucose control - independently of the type of glucose monitoring technolo gy. Manuscript received August 30, 2018 This research was carried out within the framework of the MyTreat research and development project, supported by the Swiss Commi ssion of Technology and Innovation (CTI) under Grant 18172.1 PFLS - LS. Q.
Deep Reinforcement Learning for Imbalanced Classification
Lin, Enlu, Chen, Qiong, Qi, Xiaoming
Abstract--Data in real-world application often exhibit skewed class distribution which poses an intense challenge for machine learning. Conventional classification algorithms are not effective in the case of imbalanced data distribution, and may fail when the data distribution is highly imbalanced. To address this issue, we propose a general imbalanced classification model based on deep reinforcement learning. We formulate the classification problem as a sequential decision-making process and solve it by deep Q-learning network. The agent performs a classification action on one sample at each time step, and the environment evaluates the classification action and returns a reward to the agent. The reward from minority class sample is larger so the agent is more sensitive to the minority class. The agent finally finds an optimal classification policy in imbalanced data under the guidance of specific reward function and beneficial learning environment. Experiments show that our proposed model outperforms the other imbalanced classification algorithms, and it can identify more minority samples and has great classification performance.