Reinforcement Learning
Online Model Selection for Reinforcement Learning with Function Approximation
Lee, Jonathan N., Pacchiano, Aldo, Muthukumar, Vidya, Kong, Weihao, Brunskill, Emma
Deep reinforcement learning has achieved impressive successes yet often requires a very large amount of interaction data. This result is perhaps unsurprising, as using complicated function approximation often requires more data to fit, and early theoretical results on linear Markov decision processes provide regret bounds that scale with the dimension of the linear approximation. Ideally, we would like to automatically identify the minimal dimension of the approximation that is sufficient to encode an optimal policy. Towards this end, we consider the problem of model selection in RL with function approximation, given a set of candidate RL algorithms with known regret guarantees. The learner's goal is to adapt to the complexity of the optimal algorithm without knowing it \textit{a priori}. We present a meta-algorithm that successively rejects increasingly complex models using a simple statistical test. Given at least one candidate that satisfies realizability, we prove the meta-algorithm adapts to the optimal complexity with $\tilde{O}(L^{5/6} T^{2/3})$ regret compared to the optimal candidate's $\tilde{O}(\sqrt T)$ regret, where $T$ is the number of episodes and $L$ is the number of algorithms. The dimension and horizon dependencies remain optimal with respect to the best candidate, and our meta-algorithmic approach is flexible to incorporate multiple candidate algorithms and models. Finally, we show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds that depend on the gaps between the maximal values attainable by the candidates.
Teaching KNIME to Play Tic-Tac-Toe
In this blog post I want to introduce some basic concepts of reinforcement learning, some important terminology, and show a simple use case where I create a game playing AI in KNIME Analytics Platform. After reading this, I hope you'll have a better understanding of the usefulness of reinforcement learning, as well as some key vocabulary to facilitate learning more. You may have heard of Reinforcement Learning (RL) being used to train robots to walk or gently pick up objects; or perhaps you may have heard of it's uses in the discovery of new chemical compounds for medical use. It's even being applied to regular vehicle and network traffics! Reinforcement learning is an area of Machine Learning and has become a broad field of study with many different algorithmic frameworks.
Create a Custom Deep Reinforcement Learning Environment in UE4
While the scope of reinforcement learning (RL) is likely to soon extend far beyond computer simulation, today the main location for training RL agents is within the digital environment. In the world of artificial intelligence, simulators are often the environments in which an algorithm functions. For humans, we are born directly into our simulator and it requires no effort on our part to go on functioning. We call this simulator the universe and it exists whether we believe in it or not. Similarly, the laws of physics apply whether you acknowledge them or not. They require no effort or acquiescence on our part.
DeepMind open-sources Lab2D, a grid-based environment for reinforcement learning research
DeepMind this week open-sourced Lab2D, a software system designed to support the creation of 2D environments for AI and machine learning research. The Alphabet subsidiary says that Lab2D was built with the needs of deep reinforcement learning researchers in mind, but that it can be useful beyond that particular subfield of machine learning. The DeepMind team behind Lab2D makes the case that 2D environments are inherently easier to understand than 3D ones at little loss of expressiveness. Even a game as simple as Pong, which essentially consists of three moving rectangles on a black background, can capture something fundamental about the real game of table tennis, the researchers assert. This abstraction ostensibly makes it easier to capture the essence of problems and concepts in AI. "Rich complexity along numerous dimensions can be studied in 2D just as readily as in 3D, if not more so โฆ In addition, 2D worlds are significantly less resource-intensive to run, and typically do not require any specialized hardware (like GPUs) to attain reasonable performance," the researchers continued in their paper describing Lab2D. "2D worlds have been successfully used to study problems as diverse as social complexity, navigation, imperfect information, abstract reasoning, exploration, and many more."
Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge?
de Witt, Christian Schroeder, Gupta, Tarun, Makoviichuk, Denys, Makoviychuk, Viktor, Torr, Philip H. S., Sun, Mingfei, Whiteson, Shimon
Most recently developed approaches to cooperative multi-agent reinforcement learning in the \emph{centralized training with decentralized execution} setting involve estimating a centralized, joint value function. In this paper, we demonstrate that, despite its various theoretical shortcomings, Independent PPO (IPPO), a form of independent learning in which each agent simply estimates its local value function, can perform just as well as or better than state-of-the-art joint learning approaches on popular multi-agent benchmark suite SMAC with little hyperparameter tuning. We also compare IPPO to several variants; the results suggest that IPPO's strong performance may be due to its robustness to some forms of environment non-stationarity.
Using Unity to Help Solve Intelligence
Ward, Tom, Bolt, Andrew, Hemmings, Nik, Carter, Simon, Sanchez, Manuel, Barreira, Ricardo, Noury, Seb, Anderson, Keith, Lemmon, Jay, Coe, Jonathan, Trochim, Piotr, Handley, Tom, Bolton, Adrian
In the pursuit of artificial general intelligence, our most significant measurement of progress is an agent's ability to achieve goals in a wide range of environments. Existing platforms for constructing such environments are typically constrained by the technologies they are founded on, and are therefore only able to provide a subset of scenarios necessary to evaluate progress. To overcome these shortcomings, we present our use of Unity, a widely recognized and comprehensive game engine, to create more diverse, complex, virtual simulations. We describe the concepts and components developed to simplify the authoring of these environments, intended for use predominantly in the field of reinforcement learning. We also introduce a practical approach to packaging and re-distributing environments in a way that attempts to improve the robustness and reproducibility of experiment results. To illustrate the versatility of our use of Unity compared to other solutions, we highlight environments already created using our approach from published papers. We hope that others can draw inspiration from how we adapted Unity to our needs, and anticipate increasingly varied and complex environments to emerge from our approach as familiarity grows.
Indoor Point-to-Point Navigation with Deep Reinforcement Learning and Ultra-wideband
Sutera, Enrico, Mazzia, Vittorio, Salvetti, Francesco, Fantin, Giovanni, Chiaberge, Marcello
Indoor autonomous navigation requires a precise and accurate localization system able to guide robots through cluttered, unstructured and dynamic environments. Ultra-wideband (UWB) technology, as an indoor positioning system, offers precise localization and tracking, but moving obstacles and non-line-of-sight occurrences can generate noisy and unreliable signals. That, combined with sensors noise, unmodeled dynamics and environment changes can result in a failure of the guidance algorithm of the robot. We demonstrate how a power-efficient and low computational cost point-to-point local planner, learnt with deep reinforcement learning (RL), combined with UWB localization technology can constitute a robust and resilient to noise short-range guidance system complete solution. We trained the RL agent on a simulated environment that encapsulates the robot dynamics and task constraints and then, we tested the learnt point-to-point navigation policies in a real setting with more than two-hundred experimental evaluations using UWB localization. Our results show that the computational efficient end-to-end policy learnt in plain simulation, that directly maps low-range sensors signals to robot controls, deployed in combination with ultra-wideband noisy localization in a real environment, can provide a robust, scalable and at-the-edge low-cost navigation system solution.
Machine Learning
In this era of big data, there is an increasing need to develop and deploy algorithms that can analyze and identify connections in that data. Using machine learning (a subset of artificial intelligence) it is now possible to create computer systems that automatically improve with experience. This technology has numerous real-world applications including robotic control, data mining, autonomous navigation, and bioinformatics. This course features classroom videos and assignments adapted from the CS229 graduate course as delivered on-campus at Stanford in Autumn 2018 and Autumn 2019. In order to make the content and workload more manageable for working professionals, the course has been split into two parts, XCS229i: Machine Learning and XCS229ii: Machine Learning Strategy and Intro to Reinforcement Learning.
Combining Reinforcement Learning with Model Predictive Control for On-Ramp Merging
Lubars, Joseph, Gupta, Harsh, Raja, Adnan, Srikant, R., Li, Liyun, Wu, Xinzhou
We consider the problem of designing an algorithm to allow a car to autonomously merge on to a highway from an on-ramp. Two broad classes of techniques have been proposed to solve motion planning problems in autonomous driving: Model Predictive Control (MPC) and Reinforcement Learning (RL). In this paper, we first establish the strengths and weaknesses of state-of-the-art MPC and RL-based techniques through simulations. We show that the performance of the RL agent is worse than that of the MPC solution from the perspective of safety and robustness to out-of-distribution traffic patterns, i.e., traffic patterns which were not seen by the RL agent during training. On the other hand, the performance of the RL agent is better than that of the MPC solution when it comes to efficiency and passenger comfort. We subsequently present an algorithm which blends the model-free RL agent with the MPC solution and show that it provides better trade-offs between all metrics -- passenger comfort, efficiency, crash rate and robustness.
Explaining Conditions for Reinforcement Learning Behaviors from Real and Imagined Data
Acharya, Aastha, Russell, Rebecca, Ahmed, Nisar R.
The deployment of reinforcement learning (RL) in the real world comes with challenges in calibrating user trust and expectations. As a step toward developing RL systems that are able to communicate their competencies, we present a method of generating human-interpretable abstract behavior models that identify the experiential conditions leading to different task execution strategies and outcomes. Our approach consists of extracting experiential features from state representations, abstracting strategy descriptors from trajectories, and training an interpretable decision tree that identifies the conditions most predictive of different RL behaviors. We demonstrate our method on trajectory data generated from interactions with the environment and on imagined trajectory data that comes from a trained probabilistic world model in a model-based RL setting.