Goto

Collaborating Authors

 Reinforcement Learning


Automated vehicle's behavior decision making using deep reinforcement learning and high-fidelity simulation environment

arXiv.org Artificial Intelligence

Many studies have been made to improve the AVs' ability of environment recognition and vehicle control, while the attention paid to decision making is not enough though the decision algorithms so far are very preliminary. Therefore, a framework of the decision-making training and learning is put forward in this paper. It consists of two parts: the deep reinforcement learning (DRL) training program and the high-fidelity virtual simulation environment. Then the basic microscopic behavior, car-following (CF), is trained within this framework. In addition, theoretical analysis and experiments were conducted on setting reward function for accelerating training using DRL. The results show that on the premise of driving comfort, the efficiency of the trained AV increases 7.9% compared to the classical traffic model, intelligent driver model (IDM). Later on, on a more complex three-lane section, we trained the integrated model combines both CF and lane-changing (LC) behavior, the average speed further grows 2.4%. It indicates that our framework is effective for AV's decision-making learning. Keywords: Automated vehicle; Decision making; Deep reinforcement learning; Reward function 1. Introduction The automated vehicles have captured the public attention in recent years, especially after Google announced its automated driving program in 2010, for its advantages of alleviating the traffic congestion, liberating drivers' attention and conserving energy. The tasks involved in achieving autonomous driving can be divided into three modules: environment recognition, decision making and vehicle control. Among them, the vehicle control has no obvious differences between AV and manual driven vehicle.


Learning to Navigate in Cities Without a Map

arXiv.org Artificial Intelligence

Navigating through unstructured environments is a basic capability of intelligent creatures, and thus is of fundamental interest in the study and development of artificial intelligence. Long-range navigation is a complex cognitive task that relies on developing an internal representation of space, grounded by recognisable landmarks and robust visual processing, that can simultaneously support continuous self-localisation ("I am here") and a representation of the goal ("I am going there"). Building upon recent research that applies deep reinforcement learning to maze navigation problems, we present an end-to-end deep reinforcement learning approach that can be applied on a city scale. Recognising that successful navigation relies on integration of general policies with locale-specific knowledge, we propose a dual pathway architecture that allows locale-specific features to be encapsulated, while still enabling transfer to multiple cities. We present an interactive navigation environment that uses Google StreetView for its photographic content and worldwide coverage, and demonstrate that our learning method allows agents to learn to navigate multiple cities and to traverse to target destinations that may be kilometres away. A video summarizing our research and showing the trained agent in diverse city environments as well as on the transfer task is available at: https://sites.google.com/view/streetlearn.


An introduction to Deep Q-Learning: let's play Doom

@machinelearnbot

At each time step, we receive a tuple (state, action, reward, new_state). We learn from it (we feed the tuple in our neural network), and then throw this experience. Our problem is that we give sequential samples from interactions with the environment to our neural network. And it tends to forget the previous experiences as it overwrites with new experiences. For instance, if we are in the first level and then the second (which is totally different), our agent can forget how to behave in the first level.


Regret Bounds for Model-Free Linear Quadratic Control

arXiv.org Machine Learning

Reinforcement learning (RL) algorithms have recently shown impressive performance in many challenging decision making problems, including game playing and various robotic tasks. Model-based RL approaches estimate a model of the transition dynamics and rely on the model to plan future actions using approximate dynamic programming. Model-free approaches aim to find an optimal policy without explicitly modeling the system transitions; they estimate state-action value functions or directly optimize a parameterized policy based only on interactions with the environment. Model-free RL is appealing for a number of reasons: 1) it is an "end-to-end" approach, directly optimizing the cost function of interest, 2) it can be used in settings where a model is not available and the agent only has access to a simulator, and 3) it is easy to implement. However, while model-based algorithms have been studied extensively in RL and control literature and can provide strong theoretical guarantees, model-free algorithms are not as well-understood.


On the Convergence of Competitive, Multi-Agent Gradient-Based Learning

arXiv.org Machine Learning

As learning algorithms are increasingly deployed in markets and other competitive environments, understanding their dynamics is becoming increasingly important. We study the limiting behavior of competitive agents employing gradient-based learning algorithms. Specifically, we introduce a general framework for competitive gradient-based learning that encompasses a wide breadth of learning algorithms including policy gradient reinforcement learning, gradient based bandits, and certain online convex optimization algorithms. We show that unlike the single agent case, gradient learning schemes in competitive settings do not necessarily correspond to gradient flows and, hence, it is possible for limiting behaviors like periodic orbits to exist. We introduce a new class of games, Morse-Smale games, that correspond to gradient-like flows. We provide guarantees that competitive gradient-based learning algorithms (both in the full information and gradient-free settings) avoid linearly unstable critical points (i.e. strict saddle points and unstable limit cycles). Since generic local Nash equilibria are not unstable critical points---that is, in a formal mathematical sense, almost all Nash equilibria are not strict saddles---these results imply that gradient-based learning almost surely does not get stuck at critical points that do not correspond to Nash equilibria. For Morse-Smale games, we show that competitive gradient learning converges to linearly stable cycles (which includes stable Nash equilibria) almost surely. Finally, we specialize these results to commonly used multi-agent learning algorithms and provide illustrative examples that demonstrate the wide range of limiting behaviors competitive gradient learning exhibits.


My Journey to Reinforcement Learning -- Part 0: Introduction

#artificialintelligence

When we google reinforcement learning, we can see images like above, over and over again. So rather than seeing an agent or environment, lets actually think about this as a process where a baby is learning how to walk. " The "problem statement" of the example is to walk, where the child is an agent trying to manipulate the environment (which is the surface on which it walks) by taking actions (viz walking) and he/she tries to go from one state (viz each step he/she takes) to another. The child gets a reward (let's say chocolate) when he/she accomplishes a sub module of the task (viz taking couple of steps) and will not receive any chocolate (a.k.a negative reward) when he/she is not able to walk. This is a simplified description of a reinforcement learning problem."


Weighted Double Deep Multiagent Reinforcement Learning in Stochastic Cooperative Environments

arXiv.org Artificial Intelligence

Recently, multiagent deep reinforcement learning (DRL) has received increasingly wide attention. Existing multiagent DRL algorithms are inefficient when facing with the non-stationarity due to agents update their policies simultaneously in stochastic cooperative environments. This paper extends the recently proposed weighted double estimator to the multiagent domain and propose a multiagent DRL framework, named weighted double deep Q-network (WDDQN). By utilizing the weighted double estimator and the deep neural network, WDDQN can not only reduce the bias effectively but also be extended to scenarios with raw visual inputs. To achieve efficient cooperation in the multiagent domain, we introduce the lenient reward network and the scheduled replay strategy. Experiments show that the WDDQN outperforms the existing DRL and multiaent DRL algorithms, i.e., double DQN and lenient Q-learning, in terms of the average reward and the convergence rate in stochastic cooperative environments.


CytonRL: an Efficient Reinforcement Learning Open-source Toolkit Implemented in C++

arXiv.org Artificial Intelligence

This paper presents an open-source enforcement learning toolkit named CytonRL (https://github.com/arthurxlw/cytonRL). The toolkit implements four recent advanced deep Q-learning algorithms from scratch using C++ and NVIDIA's GPU-accelerated libraries. The code is simple and elegant, owing to an open-source general-purpose neural network library named CytonLib. Benchmark shows that the toolkit achieves competitive performances on the popular Atari game of Breakout.


Optimizing Interactive Systems with Data-Driven Objectives

arXiv.org Artificial Intelligence

Effective optimization is essential for interactive systems to provide a satisfactory user experience. However, it is often challenging to find an objective to optimize for. Generally, such objectives are manually crafted and rarely capture complex user needs accurately. Conversely, we propose an approach that infers the objective directly from observed user interactions. These inferences can be made regardless of prior knowledge and across different types of user behavior. Then we introduce: Interactive System Optimizer (ISO), a novel algorithm that uses these inferred objectives for optimization. Our main contribution is a new general principled approach to optimizing interactive systems using data-driven objectives. We demonstrate the high effectiveness of ISO over several GridWorld simulations.


Why Inverse Reinforcement Learning Is GOLD!

#artificialintelligence

Inverse Reinforcement Learning(IRL) is not something very new. It popped up with work published by Andrew Ng in the year 2000. Then it has developed over last nine years with different kinds of base algorithms (IRL optimization algorithms). If any of you interested in reading about the history of this fantastic field I highly recommend you to follow this PERFECT git hub repo which consists all the paper notes from the year 2000. When it comes to solving sequential decision making Reinforcement Learning(RL) is a prevalent method.