AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Optimism in Reinforcement Learning with Generalized Linear Function Approximation

Wang, Yining, Wang, Ruosong, Du, Simon S., Krishnamurthy, Akshay

arXiv.org Machine LearningDec-9-2019

We design a new provably efficient algorithm for episodic reinforcement learning with generalized linear function approximation. We analyze the algorithm under a new expressivity assumption that we call "optimistic closure," which is strictly weaker than assumptions from prior analyses for the linear setting. With optimistic closure, we prove that our algorithm enjoys a regret bound of $\tilde{O}(\sqrt{d^3 T})$ where $d$ is the dimensionality of the state-action features and $T$ is the number of episodes. This is the first statistically and computationally efficient algorithm for reinforcement learning with generalized linear functions.

algorithm, assumption, function approximation, (12 more...)

arXiv.org Machine Learning

1912.04136

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.62)

Add feedback

Learning Sparse Representations Incrementally in Deep Reinforcement Learning

Hernandez-Garcia, J. Fernando, Sutton, Richard S.

arXiv.org Machine LearningDec-9-2019

Sparse representations have been shown to be useful in deep reinforcement learning for mitigating catastrophic interference and improving the performance of agents in terms of cumulative reward. Previous results were based on a two step process were the representation was learned offline and the action-value function was learned online afterwards. In this paper, we investigate if it is possible to learn a sparse representation and the action-value function simultaneously and incrementally. We investigate this question by employing several regularization techniques and observing how they affect sparsity of the representation learned by a DQN agent in two different benchmark domains. Our results show that with appropriate regularization it is possible to increase the sparsity of the representations learned by DQN agents. Moreover, we found that learning sparse representations also resulted in improved performance in terms of cumulative reward. Finally, we found that the performance of the agents that learned a sparse representation was more robust to the size of the experience replay buffer. This last finding supports the long standing hypothesis that the overlap in representations learned by deep neural networks is the leading cause of catastrophic interference.

activation overlap, representation, sparse representation, (14 more...)

arXiv.org Machine Learning

1912.04002

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Exploratory Not Explanatory: Counterfactual Analysis of Saliency Maps for Deep RL

Atrey, Akanksha, Clary, Kaleigh, Jensen, David

arXiv.org Artificial IntelligenceDec-9-2019

Saliency maps have been used to support explanations of deep reinforcement learning (RL) agent behavior over temporally extended sequences. However, their use in the community indicates that the explanations derived from saliency maps are often unfalsifiable and can be highly subjective. We introduce an empirical approach grounded in counterfactual reasoning to test the hypotheses generated from saliency maps and assess the degree to which saliency maps represent semantics of RL environments. We evaluate three types of saliency maps using Atari games, a common benchmark for deep RL. Our results show the extent to which existing claims about Atari games can be evaluated and suggest that saliency maps are an exploratory tool not an explanatory tool.

agent, saliency, saliency map, (15 more...)

arXiv.org Artificial Intelligence

1912.05743

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)

Genre:

Research Report > New Finding (0.86)
Research Report > Experimental Study (0.67)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Deep Bayesian Reward Learning from Preferences

Brown, Daniel S., Niekum, Scott

arXiv.org Artificial IntelligenceDec-9-2019

Bayesian inverse reinforcement learning (IRL) methods are ideal for safe imitation learning, as they allow a learning agent to reason about reward uncertainty and the safety of a learned policy. However, Bayesian IRL is computationally intractable for high-dimensional problems because each sample from the posterior requires solving an entire Markov Decision Process (MDP). While there exist non-Bayesian deep IRL methods, these methods typically infer point estimates of reward functions, precluding rigorous safety and uncertainty analysis. We propose Bayesian Reward Extrapolation (B-REX), a highly efficient, preference-based Bayesian reward learning algorithm that scales to high-dimensional, visual control tasks. Our approach uses successor feature representations and preferences over demonstrations to efficiently generate samples from the posterior distribution over the demonstrator's reward function without requiring an MDP solver. Using samples from the posterior, we demonstrate how to calculate high-confidence bounds on policy performance in the imitation learning setting, in which the ground-truth reward function is unknown. We evaluate our proposed approach on the task of learning to play Atari games via imitation learning from pixel inputs, with no access to the game score. We demonstrate that B-REX learns imitation policies that are competitive with a state-of-the-art deep imitation learning method that only learns a point estimate of the reward function. Furthermore, we demonstrate that samples from the posterior generated via B-REX can be used to compute high-confidence performance bounds for a variety of evaluation policies. We show that high-confidence performance bounds are useful for accurately ranking different evaluation policies when the reward function is unknown. We also demonstrate that high-confidence performance bounds may be useful for detecting reward hacking.

demonstration, imitation, reward function, (13 more...)

arXiv.org Artificial Intelligence

1912.04472

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > Canada (0.04)

Genre: Research Report (0.82)

Industry:

Education (0.93)
Leisure & Entertainment > Games > Computer Games (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.88)

Add feedback

Unsupervised Curricula for Visual Meta-Reinforcement Learning

Jabri, Allan, Hsu, Kyle, Eysenbach, Ben, Gupta, Abhishek, Levine, Sergey, Finn, Chelsea

arXiv.org Artificial IntelligenceDec-9-2019

In principle, meta-reinforcement learning algorithms leverage experience across many tasks to learn fast reinforcement learning (RL) strategies that transfer to similar tasks. However, current meta-RL approaches rely on manually-defined distributions of training tasks, and hand-crafting these task distributions can be challenging and time-consuming. Can "useful" pre-training tasks be discovered in an unsupervised manner? We develop an unsupervised algorithm for inducing an adaptive meta-training task distribution, i.e. an automatic curriculum, by modeling unsupervised interaction in a visual environment. The task distribution is scaffolded by a parametric density model of the meta-learner's trajectory distribution. We formulate unsupervised meta-RL as information maximization between a latent task variable and the meta-learner's data distribution, and describe a practical instantiation which alternates between integration of recent experience into the task distribution and meta-learning of the updated tasks. Repeating this procedure leads to iterative reorganization such that the curriculum adapts as the meta-learner's data distribution shifts. In particular, we show how discriminative clustering for visual representation can support trajectory-level task acquisition and exploration in domains with pixel observations, avoiding pitfalls of alternatives. In experiments on vision-based navigation and manipulation domains, we show that the algorithm allows for unsupervised meta-learning that transfers to downstream tasks specified by hand-crafted reward functions and serves as pre-training for more efficient supervised meta-learning of test task distributions.

exploration, task distribution, trajectory, (13 more...)

arXiv.org Artificial Intelligence

1912.04226

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)

Genre: Research Report (1.00)

Industry: Education (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Learning Latent State Spaces for Planning through Reward Prediction

Havens, Aaron, Ouyang, Yi, Nagarajan, Prabhat, Fujita, Yasuhiro

arXiv.org Artificial IntelligenceDec-9-2019

Model-based reinforcement learning methods typically learn models for high-dimensional state spaces by aiming to reconstruct and predict the original observations. However, drawing inspiration from model-free reinforcement learning, we propose learning a latent dynamics model directly from rewards. In this work, we introduce a model-based planning framework which learns a latent reward prediction model and then plans in the latent state-space. The latent representation is learned exclusively from multi-step reward prediction which we show to be the only necessary information for successful planning. With this framework, we are able to benefit from the concise model-free representation, while still enjoying the data-efficiency of model-based algorithms. We demonstrate our framework in multi-pendulum and multi-cheetah environments where several pendulums or cheetahs are shown to the agent but only one of which produces rewards. In these environments, it is important for the agent to construct a concise latent representation to filter out irrelevant observations. We find that our method can successfully learn an accurate latent reward prediction model in the presence of the irrelevant information while existing model-based methods fail. Planning in the learned latent state-space shows strong performance and high sample efficiency over model-free and model-based baselines.

prediction model, representation, reward prediction model, (14 more...)

arXiv.org Artificial Intelligence

1912.04201

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > Illinois > Champaign County > Urbana (0.04)
North America > United States > Illinois > Champaign County > Champaign (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

ChainerRL: A Deep Reinforcement Learning Library

Fujita, Yasuhiro, Kataoka, Toshiki, Nagarajan, Prabhat, Ishikawa, Takahiro

arXiv.org Artificial IntelligenceDec-9-2019

In this paper, we introduce ChainerRL, an open-source Deep Reinforcement Learning (DRL) library built using Python and the Chainer deep learning framework. ChainerRL implements a comprehensive set of DRL algorithms and techniques drawn from the state-of-the-art research in the field. To foster reproducible research, and for instructional purposes, ChainerRL provides scripts that closely replicate the original papers' experimental settings and reproduce published benchmark results for several algorithms. Lastly, ChainerRL offers a visualization tool that enables the qualitative inspection of trained agents. The ChainerRL source code can be found on GitHub: https://github.com/chainer/chainerrl .

agent, algorithm, chainerrl, (12 more...)

arXiv.org Artificial Intelligence

1912.03905

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.15)
North America > Canada (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Value-of-Information based Arbitration between Model-based and Model-free Control

Bera, Krishn, Mandilwar, Yash, Raju, Bapi

arXiv.org Artificial IntelligenceDec-8-2019

There have been numerous attempts in explaining the general learning behaviours using model-based and model-free methods. While the model-based control is flexible yet computationally expensive in planning, the model-free control is quick but inflexible. The model-based control is therefore immune from reward devaluation and contingency degradation. Multiple arbitration schemes have been suggested to achieve the data efficiency and computational efficiency of model-based and model-free control respectively. In this context, we propose a quantitative 'value of information' based arbitration between both the controllers in order to establish a general computational framework for skill learning. The interacting model-based and model-free reinforcement learning processes are arbitrated using an uncertainty-based value of information. We further show that our algorithm performs better than Q-learning as well as Q-learning with experience replay.

agent, learning, reinforcement, (16 more...)

arXiv.org Artificial Intelligence

1912.05453

Country: North America > United States > Texas > Travis County > Austin (0.04)

Genre: Research Report (0.64)

Industry: Law > Alternative Dispute Resolution (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Intelligent Coordination among Multiple Traffic Intersections Using Multi-Agent Reinforcement Learning

Tewari, Ujwal Padam, Bidawatka, Vishal, Raveendran, Varsha, Sudhakaran, Vinay

arXiv.org Artificial IntelligenceDec-8-2019

We use Asynchronous Advantage Actor Critic (A3C) for implementing an AI agent in the controllers that optimize flow of traffic across a single intersection and then extend it to multiple intersections by considering a multi-agent setting. We explore three different methodologies to address the multi-agent problem - (1) use of asynchronous property of A3C to control multiple intersections using a single agent (2) utilise self/competitive play among independent agents across multiple intersections and (3) ingest a global reward function among agents to introduce cooperative behavior between intersections. We observe that (1) & (2) leads to a reduction in traffic congestion. Additionally the use of (3) with (1) & (2) led to a further reduction in congestion.

agent, intersection, reward function, (13 more...)

arXiv.org Artificial Intelligence

1912.03851

Country:

Asia > India > Karnataka > Bengaluru (0.05)
North America > Canada (0.04)

Genre: Research Report (0.82)

Industry:

Transportation (0.71)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Decentralized Multi-Agent Reinforcement Learning with Networked Agents: Recent Advances

Zhang, Kaiqing, Yang, Zhuoran, Başar, Tamer

arXiv.org Artificial IntelligenceDec-8-2019

Multi-agent reinforcement learning (MARL) has long been a significant and everlasting research topic in both machine learning and control. With the recent development of (single-agent) deep RL, there is a resurgence of interests in developing new MARL algorithms, especially those that are backed by theoretical analysis. In this paper, we review some recent advances a sub-area of this topic: decentralized MARL with networked agents. Specifically, multiple agents perform sequential decision-making in a common environment, without the coordination of any central controller. Instead, the agents are allowed to exchange information with their neighbors over a communication network. Such a setting finds broad applications in the control and operation of robots, unmanned vehicles, mobile sensor networks, and smart grid. This review is built upon several our research endeavors in this direction, together with some progresses made by other researchers along the line. We hope this review to inspire the devotion of more research efforts to this exciting yet challenging area.

agent, algorithm, function approximation, (10 more...)

arXiv.org Artificial Intelligence

1912.03821

Country:

North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > United States > Illinois > Champaign County > Champaign (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany (0.04)

Genre: Overview (1.00)

Industry:

Leisure & Entertainment > Games (0.68)
Energy > Power Industry (0.48)
Transportation (0.46)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback