Goto

Collaborating Authors

 Reinforcement Learning


Book Review: Deep Reinforcement Learning Hands-On - insideBIGDATA

#artificialintelligence

Reinforcement learning (RL) is a hugely popular area of deep learning, and many data scientists are exploring this AI technology to broaden their skillet to include a number of important problem domains like chatbots, robotics, discrete optimization, web automation and much more. As a result of this wide-spread interest in RL, there are many available educational resources specifically tailored to this class of deep learning โ€“ boot camps, training certificates, educational specializations, etc. But if you're a data scientist who has been programming in Python (with object oriented features) for a while, and has some experience with other forms of deep learning using a framework like TensorFlow, then maybe this new book, "Deep Reinforcement Learning Hands-On," by Maxim Lapan from Packt, might be a great way to kick-start yourself into becoming productive with RL. RL development is being driven by a number of large companies and research groups, including Google, Microsoft, and Facebook. RL requires considerable investment in research as the field is growing to enable data scientists to be able to take prescribed methods and apply them to a problem domain.


Learning to Play No-Press Diplomacy with Best Response Policy Iteration

arXiv.org Artificial Intelligence

Recent advances in deep reinforcement learning (RL) have led to considerable progress in many 2-player zero-sum games, such as Go, Poker and Starcraft. The purely adversarial nature of such games allows for conceptually simple and principled application of RL methods. However real-world settings are many-agent, and agent interactions are complex mixtures of common-interest and competitive aspects. We consider Diplomacy, a 7-player board game designed to accentuate dilemmas resulting from many-agent interactions. It also features a large combinatorial action space and simultaneous moves, which are challenging for RL algorithms. We propose a simple yet effective approximate best response operator, designed to handle large combinatorial action spaces and simultaneous moves. We also introduce a family of policy iteration methods that approximate fictitious play. With these methods, we successfully apply RL to Diplomacy: we show that our agents convincingly outperform the previous state-of-the-art, and game theoretic equilibrium analysis shows that the new process yields consistent improvements.


Assessment of Reward Functions for Reinforcement Learning Traffic Signal Control under Real-World Limitations

arXiv.org Artificial Intelligence

Adaptive traffic signal control is one key avenue for mitigating the growing consequences of traffic congestion. Incumbent solutions such as SCOOT and SCATS require regular and time-consuming calibration, can't optimise well for multiple road use modalities, and require the manual curation of many implementation plans. A recent alternative to these approaches are deep reinforcement learning algorithms, in which an agent learns how to take the most appropriate action for a given state of the system. This is guided by neural networks approximating a reward function that provides feedback to the agent regarding the performance of the actions taken, making it sensitive to the specific reward function chosen. Several authors have surveyed the reward functions used in the literature, but attributing outcome differences to reward function choice across works is problematic as there are many uncontrolled differences, as well as different outcome metrics. This paper compares the performance of agents using different reward functions in a simulation of a junction in Greater Manchester, UK, across various demand profiles, subject to real world constraints: realistic sensor inputs, controllers, calibrated demand, intergreen times and stage sequencing. The reward metrics considered are based on the time spent stopped, lost time, change in lost time, average speed, queue length, junction throughput and variations of these magnitudes. The performance of these reward functions is compared in terms of total waiting time. We find that speed maximisation resulted in the lowest average waiting times across all demand levels, displaying significantly better performance than other rewards previously introduced in the literature.


Decision-making for Autonomous Vehicles on Highway: Deep Reinforcement Learning with Continuous Action Horizon

arXiv.org Artificial Intelligence

Decision-making strategy for autonomous vehicles de-scribes a sequence of driving maneuvers to achieve a certain navigational mission. This paper utilizes the deep reinforcement learning (DRL) method to address the continuous-horizon decision-making problem on the highway. First, the vehicle kinematics and driving scenario on the freeway are introduced. The running objective of the ego automated vehicle is to execute an efficient and smooth policy without collision. Then, the particular algorithm named proximal policy optimization (PPO)-enhanced DRL is illustrated. To overcome the challenges in tardy training efficiency and sample inefficiency, this applied algorithm could realize high learning efficiency and excellent control performance. Finally, the PPO-DRL-based decision-making strategy is estimated from multiple perspectives, including the optimality, learning efficiency, and adaptability. Its potential for online application is discussed by applying it to similar driving scenarios.


Constrained Markov Decision Processes via Backward Value Functions

arXiv.org Machine Learning

Although Reinforcement Learning (RL) algorithms have found tremendous success in simulated domains, they often cannot directly be applied to physical systems, especially in cases where there are hard constraints to satisfy (e.g. on safety or resources). In standard RL, the agent is incentivized to explore any behavior as long as it maximizes rewards, but in the real world, undesired behavior can damage either the system or the agent in a way that breaks the learning process itself. In this work, we model the problem of learning with constraints as a Constrained Markov Decision Process and provide a new on-policy formulation for solving it. A key contribution of our approach is to translate cumulative cost constraints into state-based constraints. Through this, we define a safe policy improvement method which maximizes returns while ensuring that the constraints are satisfied at every step. We provide theoretical guarantees under which the agent converges while ensuring safety over the course of training. We also highlight the computational advantages of this approach. The effectiveness of our approach is demonstrated on safe navigation tasks and in safety-constrained versions of MuJoCo environments, with deep neural networks.


DeepMind's Three Pillars for Building Robust Machine Learning Systems

#artificialintelligence

I recently started a new newsletter focus on AI education. TheSequence is a no-BS( meaning no hype, no news etc) AI-focused newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Building machine learning systems differs from traditional software development in many aspects of its lifecycle. Established software methodologies for testing, debugging and troubleshooting result simply impractical when applied to machine learning models.


Frangula californica (California coffeeberry): Matriculating undergraduates, now, for 1/1/21, Realistic Virtual Earth for Machine Learning - WUaS News, Livestream, Q&A - i) Seeking to matriculate our 2nd undergraduate class Jan 1, 2021, and potentially with students taking WUaS Open edX courses, ii) How WUaS or edX could provide a letter to prospective employers that a student is matriculated officially at WUaS / edX and similar?, iii) Creating a single #RealisticVirtualEarth beginning w #GoogleResearchFootball for learning machine learning / AI, and with Lego Robotics too, iv) WUaS Monthly Business Meeting Minutes for 8/15 * * How to BEGIN 1 #RealisticVirtualEarth in #GoogleStreetView w #TimeSlider for learning #MachineLearning & w #LegoRobotics? #FilmTo3D App >#RealisticVirtualEarthForRobotics Google open-sources soccer reinforcement learning sim #ReinforcementLearning #GRFE

#artificialintelligence

Frangula californica (California coffeeberry): Matriculating undergraduates, now, for 1/1/21, Realistic Virtual Earth for Machine Learning - WUaS News, Livestream, Q&A - i) Seeking to matriculate our 2nd undergraduate class Jan 1, 2021, and potentially with students taking WUaS Open edX courses, ii) How WUaS or edX could provide a letter to prospective employers that a student is matriculated officially at WUaS / edX and similar?, iii) Creating a single #RealisticVirtualEarth beginning w #GoogleResearchFootball for learning machine learning / AI, and with Lego Robotics too, iv) WUaS Monthly Business Meeting Minutes for 8/15 * * How to BEGIN 1 #RealisticVirtualEarth in #GoogleStreetView w #TimeSlider for learning #MachineLearning & w #LegoRobotics? Add Lego Robotics with similar Film-To-3D App, such as - 6.270 MIT Lego Robot Competition 1999 - https://youtu.be/SXH-bBw3uxg And with Lego weDo 2.0 too - Special WeDo 2.0 Scratch Project BOXER from Roboriseit! https://youtu.be/HjD1zAWToYU It is unlikely that I will get to contribute to this work. So please take me off all your mailing lists.


Ensuring Monotonic Policy Improvement in Entropy-regularized Value-based Reinforcement Learning

arXiv.org Artificial Intelligence

Reinforcement Learning (RL) (Sutton and Barto 2018) has A significant factor causing the complexity might be its excessive recently achieved impressive successes in fields such as generality (Kakade and Langford 2002; Pirotta et al. robotic manipulation (OpenAI 2019), video game playing 2013); Those bounds do not focus on any particular class (Mnih et al. 2015) and the game of Go (Silver et al. 2016). of value-based RL algorithms. In this paper, in order to develop However, compared with supervised learning that has widerange more tractable bounds, we focus on an RL class known of practical applications, RL applications have primarily as entropy-regularized value-based methods (Azar, Gรณmez, been limited to casual game playing or laboratory and Kappen 2012; Fox, Pakman, and Tishby 2016; Haarnoja based robotics. A crucial reason for limiting applications et al. 2017, 2018), where the entropies of policies are introduced to these environments is that it is not guaranteed that the


Imitative Planning using Conditional Normalizing Flow

arXiv.org Artificial Intelligence

We explore the application of normalizing flows for improving the performance of trajectory planning for autonomous vehicles (AVs). Normalizing flows provide an invertible mapping from a known prior distribution to a potentially complex, multi-modal target distribution and allow for fast sampling with exact PDF inference. By modeling a trajectory planner's cost manifold as an energy function we learn a scene conditioned mapping from the prior to a Boltzmann distribution over the AV control space. This mapping allows for control samples and their associated energy to be generated jointly and in parallel. We propose using neural autoregressive flow (NAF) as part of an end-to-end deep learned system that allows for utilizing sensors, map, and route information to condition the flow mapping. Finally, we demonstrate the effectiveness of our approach on real world datasets over IL and hand constructed trajectory sampling techniques.


Theory of Deep Q-Learning: A Dynamical Systems Perspective

arXiv.org Artificial Intelligence

Deep Q-Learning is an important algorithm, used to solve sequential decision making problems. It involves training a Deep Neural Network, called a Deep Q-Network (DQN), to approximate a function associated with optimal decision making, the Q-function. Although wildly successful in laboratory conditions, serious gaps between theory and practice prevent its use in the real-world. In this paper, we present a comprehensive analysis of the popular and practical version of the algorithm, under realistic verifiable assumptions. An important contribution is the characterization of its performance as a function of training. To do this, we view the algorithm as an evolving dynamical system. This facilitates associating a closely-related measure process with training. Then, the long-term behavior of Deep Q-Learning is determined by the limit of the aforementioned measure process. Empirical inferences, such as the qualitative advantage of using experience replay, and performance inconsistencies even after training, are explained using our analysis. Also, our theory is general and accommodates state Markov processes with multiple stationary distributions.