Goto

Collaborating Authors

 Reinforcement Learning


A Brief Survey of Deep Reinforcement Learning

arXiv.org Machine Learning

Deep reinforcement learning is poised to revolutionise the field of AI and represents a step towards building autonomous systems with a higher level understanding of the visual world. Currently, deep learning is enabling reinforcement learning to scale to problems that were previously intractable, such as learning to play video games directly from pixels. Deep reinforcement learning algorithms are also applied to robotics, allowing control policies for robots to be learned directly from camera inputs in the real world. In this survey, we begin with an introduction to the general field of reinforcement learning, then progress to the main streams of value-based and policy-based methods. Our survey will cover central algorithms in deep reinforcement learning, including the deep $Q$-network, trust region policy optimisation, and asynchronous advantage actor-critic. In parallel, we highlight the unique advantages of deep neural networks, focusing on visual understanding via reinforcement learning. To conclude, we describe several current areas of research within the field.


The detour problem in a stochastic environment: Tolman revisited

arXiv.org Machine Learning

We designed a grid world task to study human planning and re-planning behavior in an unknown stochastic environment. In our grid world, participants were asked to travel from a random starting point to a random goal position while maximizing their reward. Because they were not familiar with the environment, they needed to learn its characteristics from experience to plan optimally. Later in the task, we randomly blocked the optimal path to investigate whether and how people adjust their original plans to find a detour. To this end, we developed and compared 12 different models. These models were different on how they learned and represented the environment and how they planned to catch the goal. The majority of our participants were able to plan optimally. We also showed that people were capable of revising their plans when an unexpected event occurred. The result from the model comparison showed that the model-based reinforcement learning approach provided the best account for the data and outperformed heuristics in explaining the behavioral data in the re-planning trials.


Virtual to Real Reinforcement Learning for Autonomous Driving

arXiv.org Artificial Intelligence

Reinforcement learning is considered as a promising direction for driving policy learning. However, training autonomous driving vehicle with reinforcement learning in real environment involves non-affordable trial-and-error. It is more desirable to first train in a virtual environment and then transfer to the real environment. In this paper, we propose a novel realistic translation network to make model trained in virtual environment be workable in real world. The proposed network can convert non-realistic virtual image input into a realistic one with similar scene structure. Given realistic frames as input, driving policy trained by reinforcement learning can nicely adapt to real world driving. Experiments show that our proposed virtual to real (VR) reinforcement learning (RL) works pretty well. To our knowledge, this is the first successful case of driving policy trained by reinforcement learning that can adapt to real world driving data.


An Optimal Online Method of Selecting Source Policies for Reinforcement Learning

arXiv.org Machine Learning

Transfer learning significantly accelerates the reinforcement learning process by exploiting relevant knowledge from previous experiences. The problem of optimally selecting source policies during the learning process is of great importance yet challenging. There has been little theoretical analysis of this problem. In this paper, we develop an optimal online method to select source policies for reinforcement learning. This method formulates online source policy selection as a multi-armed bandit problem and augments Q-learning with policy reuse. We provide theoretical guarantees of the optimal selection process and convergence to the optimal policy. In addition, we conduct experiments on a grid-based robot navigation domain to demonstrate its efficiency and robustness by comparing to the state-of-the-art transfer learning method.


Reinforcement Learning for Predictive Analytics in Smart Cities

@machinelearnbot

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Generalized Quantum Reinforcement Learning with Quantum Technologies

arXiv.org Machine Learning

We propose a protocol to perform generalized quantum reinforcement learning with quantum technologies. At variance with recent results on quantum reinforcement learning with superconducting circuits [L. Lamata, Sci. Rep. 7, 1609 (2017)], in our current protocol coherent feedback during the learning process is not required, enabling its implementation in a wide variety of quantum systems. We consider diverse possible scenarios for an agent, an environment, and a register that connects them, involving multiqubit and multilevel systems, as well as open-system dynamics. We finally propose possible implementations of this protocol in trapped ions and superconducting circuits. The field of quantum reinforcement learning with quantum technologies will enable enhanced quantum control, as well as more efficient machine learning calculations.


On overfitting and asymptotic bias in batch reinforcement learning with partial observability

arXiv.org Machine Learning

This paper stands in the context of reinforcement learning with partial observability and limited data. In this setting, we focus on the tradeoff between asymptotic bias (suboptimality with unlimited data) and overfitting (additional suboptimality due to limited data), and theoretically show that while potentially increasing the asymptotic bias, a smaller state representation decreases the risk of overfitting. Our analysis relies on expressing the quality of a state representation by bounding L1 error terms of the associated belief states. Theoretical results are empirically illustrated when the state representation is a truncated history of observations. Finally, we also discuss and empirically illustrate how using function approximators and adapting the discount factor may enhance the tradeoff between asymptotic bias and overfitting.


Learning Complex Swarm Behaviors by Exploiting Local Communication Protocols with Deep Reinforcement Learning

arXiv.org Machine Learning

Abstract-- Swarm systems constitute a challenging problem for reinforcement learning (RL) as the algorithm needs to learn decentralized control policies that can cope with limited local sensing and communication abilities of the agents. Although there have been recent advances of deep RL algorithms applied to multi-agent systems, learning communication protocols while simultaneously learning the behavior of the agents is still beyond the reach of deep RL algorithms. However, while it is often difficult to directly define the behavior of the agents, simple communication protocols can be defined more easily using prior knowledge about the given task. In this paper, we propose a number of simple communication protocols that can be exploited by deep reinforcement learning to find decentralized control policies in a multi-robot swarm environment. The protocols are based on histograms that encode the local neighborhood relations of the agents and can also transmit task-specific information, such as the shortest distance and direction to a desired target. In our framework, we use an adaptation of Trust Region Policy Optimization to learn complex collaborative tasks, such as formation building, building a communication link, and pushing an intruder. We evaluate our findings in a simulated 2D-physics environment, and compare the implications of different communication protocols. I. INTRODUCTION Nature provides many examples where the performance of a collective of limited beings exceeds the capabilities of one individual. Ants transport prey of the size no single ant could carry, termites build nests of up to nine meters in height, and bees are able to regulate the temperature of a hive.


Feature Engineering for Predictive Modeling using Reinforcement Learning

arXiv.org Machine Learning

Feature engineering is a crucial step in the process of predictive modeling. It involves the transformation of given feature space, typically using mathematical functions, with the objective of reducing the modeling error for a given target. However, there is no well-defined basis for performing effective feature engineering. It involves domain knowledge, intuition, and most of all, a lengthy process of trial and error. The human attention involved in overseeing this process significantly influences the cost of model generation. We present a new framework to automate feature engineering. It is based on performance driven exploration of a transformation graph, which systematically and compactly enumerates the space of given options. A highly efficient exploration strategy is derived through reinforcement learning on past examples.


Learning to Drive using Inverse Reinforcement Learning and Deep Q-Networks

arXiv.org Artificial Intelligence

We propose an inverse reinforcement learning (IRL) approach using Deep Q-Networks to extract the rewards in problems with large state spaces. We evaluate the performance of this approach in a simulation-based autonomous driving scenario. Our results resemble the intuitive relation between the reward function and readings of distance sensors mounted at different poses on the car. We also show that, after a few learning rounds, our simulated agent generates collision-free motions and performs human-like lane change behaviour.