AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

A Brief Survey of Deep Reinforcement Learning

Arulkumaran, Kai, Deisenroth, Marc Peter, Brundage, Miles, Bharath, Anil Anthony

arXiv.org Machine LearningSep-28-2017

Deep reinforcement learning is poised to revolutionise the field of AI and represents a step towards building autonomous systems with a higher level understanding of the visual world. Currently, deep learning is enabling reinforcement learning to scale to problems that were previously intractable, such as learning to play video games directly from pixels. Deep reinforcement learning algorithms are also applied to robotics, allowing control policies for robots to be learned directly from camera inputs in the real world. In this survey, we begin with an introduction to the general field of reinforcement learning, then progress to the main streams of value-based and policy-based methods. Our survey will cover central algorithms in deep reinforcement learning, including the deep $Q$-network, trust region policy optimisation, and asynchronous advantage actor-critic. In parallel, we highlight the unique advantages of deep neural networks, focusing on visual understanding via reinforcement learning. To conclude, we describe several current areas of research within the field.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

doi: 10.1109/MSP.2017.2743240

1708.05866

Country: Europe (0.46)

Genre: Overview (1.00)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Education (1.00)
Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The detour problem in a stochastic environment: Tolman revisited

Fakhari, Pegah, Khodadadi, Arash, Busemeyer, Jerome

arXiv.org Machine LearningSep-27-2017

We designed a grid world task to study human planning and re-planning behavior in an unknown stochastic environment. In our grid world, participants were asked to travel from a random starting point to a random goal position while maximizing their reward. Because they were not familiar with the environment, they needed to learn its characteristics from experience to plan optimally. Later in the task, we randomly blocked the optimal path to investigate whether and how people adjust their original plans to find a detour. To this end, we developed and compared 12 different models. These models were different on how they learned and represented the environment and how they planned to catch the goal. The majority of our participants were able to plan optimally. We also showed that people were capable of revising their plans when an unexpected event occurred. The result from the model comparison showed that the model-based reinforcement learning approach provided the best account for the data and outperformed heuristics in explaining the behavioral data in the re-planning trials.

artificial intelligence, machine learning, reinforcement learning, (21 more...)

arXiv.org Machine Learning

1709.09761

Country: North America > United States (0.93)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)

Add feedback

Virtual to Real Reinforcement Learning for Autonomous Driving

Pan, Xinlei, You, Yurong, Wang, Ziyan, Lu, Cewu

arXiv.org Artificial IntelligenceSep-26-2017

Reinforcement learning is considered as a promising direction for driving policy learning. However, training autonomous driving vehicle with reinforcement learning in real environment involves non-affordable trial-and-error. It is more desirable to first train in a virtual environment and then transfer to the real environment. In this paper, we propose a novel realistic translation network to make model trained in virtual environment be workable in real world. The proposed network can convert non-realistic virtual image input into a realistic one with similar scene structure. Given realistic frames as input, driving policy trained by reinforcement learning can nicely adapt to real world driving. Experiments show that our proposed virtual to real (VR) reinforcement learning (RL) works pretty well. To our knowledge, this is the first successful case of driving policy trained by reinforcement learning that can adapt to real world driving data.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

1704.03952

Country:

Asia > China (0.28)
North America > United States > California (0.28)

Genre: Research Report (0.83)

Industry:

Automobiles & Trucks (1.00)
Transportation > Ground > Road (0.86)
Information Technology > Robotics & Automation (0.72)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

An Optimal Online Method of Selecting Source Policies for Reinforcement Learning

Li, Siyuan, Zhang, Chongjie

arXiv.org Machine LearningSep-24-2017

Transfer learning significantly accelerates the reinforcement learning process by exploiting relevant knowledge from previous experiences. The problem of optimally selecting source policies during the learning process is of great importance yet challenging. There has been little theoretical analysis of this problem. In this paper, we develop an optimal online method to select source policies for reinforcement learning. This method formulates online source policy selection as a multi-armed bandit problem and augments Q-learning with policy reuse. We provide theoretical guarantees of the optimal selection process and convergence to the optimal policy. In addition, we conduct experiments on a grid-based robot navigation domain to demonstrate its efficiency and robustness by comparing to the state-of-the-art transfer learning method.

machine learning, reinforcement learning, source policy, (18 more...)

arXiv.org Machine Learning

1709.08201

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reinforcement Learning for Predictive Analytics in Smart Cities

@machinelearnbotSep-23-2017, 00:15:19 GMT

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

data mining, machine learning, reinforcement learning, (10 more...)

@machinelearnbot

Industry: Information Technology (0.39)

Technology:

Information Technology > Data Science > Data Mining (0.76)
Information Technology > Internet of Things (0.52)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.43)

Add feedback

Generalized Quantum Reinforcement Learning with Quantum Technologies

Cárdenas-López, F. A., Lamata, L., Retamal, J. C., Solano, E.

arXiv.org Machine LearningSep-22-2017

We propose a protocol to perform generalized quantum reinforcement learning with quantum technologies. At variance with recent results on quantum reinforcement learning with superconducting circuits [L. Lamata, Sci. Rep. 7, 1609 (2017)], in our current protocol coherent feedback during the learning process is not required, enabling its implementation in a wide variety of quantum systems. We consider diverse possible scenarios for an agent, an environment, and a register that connects them, involving multiqubit and multilevel systems, as well as open-system dynamics. We finally propose possible implementations of this protocol in trapped ions and superconducting circuits. The field of quantum reinforcement learning with quantum technologies will enable enhanced quantum control, as well as more efficient machine learning calculations.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1709.07848

Country:

South America (0.28)
Europe > Spain (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

On overfitting and asymptotic bias in batch reinforcement learning with partial observability

Francois-Lavet, Vincent, Ernst, Damien, Fonteneau, Raphael

arXiv.org Machine LearningSep-22-2017

This paper stands in the context of reinforcement learning with partial observability and limited data. In this setting, we focus on the tradeoff between asymptotic bias (suboptimality with unlimited data) and overfitting (additional suboptimality due to limited data), and theoretically show that while potentially increasing the asymptotic bias, a smaller state representation decreases the risk of overfitting. Our analysis relies on expressing the quality of a state representation by bounding L1 error terms of the associated belief states. Theoretical results are empirically illustrated when the state representation is a truncated history of observations. Finally, we also discuss and empirically illustrate how using function approximators and adapting the discount factor may enhance the tradeoff between asymptotic bias and overfitting.

asymptotic bias, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1709.07796

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.98)

Add feedback

Learning Complex Swarm Behaviors by Exploiting Local Communication Protocols with Deep Reinforcement Learning

Hüttenrauch, Maximilian, Šošić, Adrian, Neumann, Gerhard

arXiv.org Machine LearningSep-21-2017

Abstract-- Swarm systems constitute a challenging problem for reinforcement learning (RL) as the algorithm needs to learn decentralized control policies that can cope with limited local sensing and communication abilities of the agents. Although there have been recent advances of deep RL algorithms applied to multi-agent systems, learning communication protocols while simultaneously learning the behavior of the agents is still beyond the reach of deep RL algorithms. However, while it is often difficult to directly define the behavior of the agents, simple communication protocols can be defined more easily using prior knowledge about the given task. In this paper, we propose a number of simple communication protocols that can be exploited by deep reinforcement learning to find decentralized control policies in a multi-robot swarm environment. The protocols are based on histograms that encode the local neighborhood relations of the agents and can also transmit task-specific information, such as the shortest distance and direction to a desired target. In our framework, we use an adaptation of Trust Region Policy Optimization to learn complex collaborative tasks, such as formation building, building a communication link, and pushing an intruder. We evaluate our findings in a simulated 2D-physics environment, and compare the implications of different communication protocols. I. INTRODUCTION Nature provides many examples where the performance of a collective of limited beings exceeds the capabilities of one individual. Ants transport prey of the size no single ant could carry, termites build nests of up to nine meters in height, and bees are able to regulate the temperature of a hive.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1709.07224

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Education (0.46)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Feature Engineering for Predictive Modeling using Reinforcement Learning

Khurana, Udayan, Samulowitz, Horst, Turaga, Deepak

arXiv.org Machine LearningSep-21-2017

Feature engineering is a crucial step in the process of predictive modeling. It involves the transformation of given feature space, typically using mathematical functions, with the objective of reducing the modeling error for a given target. However, there is no well-defined basis for performing effective feature engineering. It involves domain knowledge, intuition, and most of all, a lengthy process of trial and error. The human attention involved in overseeing this process significantly influences the cost of model generation. We present a new framework to automate feature engineering. It is based on performance driven exploration of a transformation graph, which systematically and compactly enumerates the space of given options. A highly efficient exploration strategy is derived through reinforcement learning on past examples.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1709.0715

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)

Add feedback

Learning to Drive using Inverse Reinforcement Learning and Deep Q-Networks

Sharifzadeh, Sahand, Chiotellis, Ioannis, Triebel, Rudolph, Cremers, Daniel

arXiv.org Artificial IntelligenceSep-21-2017

We propose an inverse reinforcement learning (IRL) approach using Deep Q-Networks to extract the rewards in problems with large state spaces. We evaluate the performance of this approach in a simulation-based autonomous driving scenario. Our results resemble the intuitive relation between the reward function and readings of distance sensors mounted at different poses on the car. We also show that, after a few learning rounds, our simulated agent generates collision-free motions and performs human-like lane change behaviour.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

1612.03653

Country:

Europe > Germany (0.14)
Europe > Spain (0.14)

Genre: Research Report > New Finding (0.35)

Industry: Transportation > Ground > Road (0.89)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback