AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

Rashid, Tabish, Samvelyan, Mikayel, de Witt, Christian Schroeder, Farquhar, Gregory, Foerster, Jakob, Whiteson, Shimon

arXiv.org Machine LearningMar-19-2020

In many real-world settings, a team of agents must coordinate its behaviour while acting in a decentralised fashion. At the same time, it is often possible to train the agents in a centralised fashion where global state information is available and communication constraints are lifted. Learning joint action-values conditioned on extra state information is an attractive way to exploit centralised learning, but the best strategy for then extracting decentralised policies is unclear. Our solution is QMIX, a novel value-based method that can train decentralised policies in a centralised end-to-end fashion. QMIX employs a mixing network that estimates joint action-values as a monotonic combination of per-agent values. We structurally enforce that the joint-action value is monotonic in the per-agent values, through the use of non-negative weights in the mixing network, which guarantees consistency between the centralised and decentralised policies. To evaluate the performance of QMIX, we propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning. We evaluate QMIX on a challenging set of SMAC scenarios and show that it significantly outperforms existing multi-agent reinforcement learning methods.

agent, qmix, scenario, (15 more...)

arXiv.org Machine Learning

2003.08839

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Denmark (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

IG-RL: Inductive Graph Reinforcement Learning for Massive-Scale Traffic Signal Control

Devailly, François-Xavier, Larocque, Denis, Charlin, Laurent

arXiv.org Machine LearningMar-19-2020

Scaling adaptive traffic-signal control involves dealing with combinatorial state and action spaces. Multi-agent reinforcement learning attempts to address this challenge by distributing control to specialized agents. However, specialization hinders generalization and transferability, and the computational graphs underlying neural-networks architectures---dominating in the multi-agent setting---do not offer the flexibility to handle an arbitrary number of entities which changes both between road networks, and over time as vehicles traverse the network. We introduce Inductive Graph Reinforcement Learning (IG-RL) based on graph-convolutional networks which adapts to the structure of any road network, to learn detailed representations of traffic-controllers and their surroundings. Our decentralized approach enables learning of a transferable-adaptive-traffic-signal-control policy. After being trained on an arbitrary set of road networks, our model can generalize to new road networks, traffic distributions, and traffic regimes, with no additional training and a constant number of parameters, enabling greater scalability compared to prior methods. Furthermore, our approach can exploit the granularity of available data by capturing the (dynamic) demand at both the lane and the vehicle levels. The proposed method is tested on both road networks and traffic settings never experienced during training. We compare IG-RL to multi-agent reinforcement learning and domain-specific baselines. In both synthetic road networks and in a larger experiment involving the control of the 3,971 traffic signals of Manhattan, we show that different instantiations of IG-RL outperform baselines.

intersection, road network, vehicle, (14 more...)

arXiv.org Machine Learning

2003.05738

Country: North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.82)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Exchangeable Input Representations for Reinforcement Learning

Mern, John, Sadigh, Dorsa, Kochenderfer, Mykel J.

arXiv.org Artificial IntelligenceMar-19-2020

Poor sample efficiency is a major limitation of deep reinforcement learning in many domains. This work presents an attention-based method to project neural network inputs into an efficient representation space that is invariant under changes to input ordering. We show that our proposed representation results in an input space that is a factor of $m!$ smaller for inputs of $m$ objects. We also show that our method is able to represent inputs over variable numbers of objects. Our experiments demonstrate improvements in sample efficiency for policy gradient methods on a variety of tasks. We show that our representation allows us to solve problems that are otherwise intractable when using na\"ive approaches.

attention mechanism, representation, vector, (12 more...)

arXiv.org Artificial Intelligence

2003.09022

Country:

North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Learning to Fly via Deep Model-Based Reinforcement Learning

Becker-Ehmck, Philip, Karl, Maximilian, Peters, Jan, van der Smagt, Patrick

arXiv.org Artificial IntelligenceMar-19-2020

Learning to control robots without requiring models has been a long-term goal, promising diverse and novel applications. Yet, reinforcement learning has only achieved limited impact on real-time robot control due to its high demand of real-world interactions. In this work, by leveraging a learnt probabilistic model of drone dynamics, we achieve human-like quadrotor control through model-based reinforcement learning. No prior knowledge of the flight dynamics is assumed; instead, a sequential latent variable model, used generatively and as an online filter, is learnt from raw sensory input. The controller and value function are optimised entirely by propagating stochastic analytic gradients through generated latent trajectories. We show that "learning to fly" can be achieved with less than 30 minutes of experience with a single drone, and can be deployed solely using onboard computational resources and sensors, on a self-built drone.

controller, drone, international conference, (14 more...)

arXiv.org Artificial Intelligence

2003.08876

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(4 more...)

Genre:

Research Report (0.50)
Personal (0.46)

Industry:

Transportation > Air (1.00)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.93)

Add feedback

Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning

Farquhar, Gregory, Whiteson, Shimon, Foerster, Jakob

Neural Information Processing SystemsMar-18-2020, 23:48:26 GMT

Gradient-based methods for optimisation of objectives in stochastic settings with unknown or intractable dynamics require estimators of derivatives. We derive an objective that, under automatic differentiation, produces low-variance unbiased estimators of derivatives at any order. Our objective is compatible with arbitrary advantage estimators, which allows the control of the bias and variance of any-order derivatives when using function approximation. Furthermore, we propose a method to trade off bias and variance of higher order derivatives by discounting the impact of more distant causal dependencies. We demonstrate the correctness and utility of our estimator in analytically tractable MDPs and in meta-reinforcement-learning for continuous control.

any-order score function gradient estimator, bias and variance, reinforcement learning, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

A Structured Prediction Approach for Generalization in Cooperative Multi-Agent Reinforcement Learning

Carion, Nicolas, Usunier, Nicolas, Synnaeve, Gabriel, Lazaric, Alessandro

Neural Information Processing SystemsMar-18-2020, 23:48:18 GMT

Effective coordination is crucial to solve multi-agent collaborative (MAC) problems. While centralized reinforcement learning methods can optimally solve small MAC instances, they do not scale to large problems and they fail to generalize to scenarios different from those seen during training. In this paper, we consider MAC problems with some intrinsic notion of locality (e.g., geographic proximity) such that interactions between agents and tasks are locally limited. By leveraging this property, we introduce a novel structured prediction approach to assign agents to tasks. At each step, the assignment is obtained by solving a centralized optimization problem (the inference procedure) whose objective function is parameterized by a learned scoring model.

cooperative multi-agent reinforcement learning, generalization, structured prediction approach, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.40)

Add feedback

Provably Efficient Q-learning with Function Approximation via Distribution Shift Error Checking Oracle

Du, Simon S., Luo, Yuping, Wang, Ruosong, Zhang, Hanrui

Neural Information Processing SystemsMar-18-2020, 23:47:57 GMT

Q-learning with function approximation is one of the most popular methods in reinforcement learning. Though the idea of using function approximation was proposed at least 60 years ago, even in the simplest setup, i.e, approximating Q-functions with linear functions, it is still an open problem how to design a provably efficient algorithm that learns a near-optimal policy. The key challenges are how to efficiently explore the state space and how to decide when to stop exploring in conjunction with the function approximation scheme. The current paper presents a provably efficient algorithm for Q-learning with linear function approximation. Under certain regularity assumptions, our algorithm, Difference Maximization Q-learning, combined with linear function approximation, returns a near-optimal policy using polynomial number of trajectories.

approximation, function approximation, q-learning, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Provably Efficient Q-Learning with Low Switching Cost

Bai, Yu, Xie, Tengyang, Jiang, Nan, Wang, Yu-Xiang

Neural Information Processing SystemsMar-18-2020, 23:47:40 GMT

We take initial steps in studying PAC-MDP algorithms with limited adaptivity, that is, algorithms that change its exploration policy as infrequently as possible during regret minimization. This is motivated by the difficulty of running fully adaptive algorithms in real-world applications (such as medical domains), and we propose to quantify adaptivity using the notion of \emph{local switching cost}. Our main contribution, Q-Learning with UCB2 exploration, is a model-free algorithm for $H$-step episodic MDP that achieves sublinear regret whose local switching cost in $K$ episodes is $O(H 3SA\log K)$, and we provide a lower bound of $\Omega(HSA)$ on the local switching cost for any no-regret algorithm. Our algorithm can be naturally adapted to the concurrent setting \citep{guo2015concurrent}, which yields nontrivial results that improve upon prior work in certain aspects. Papers published at the Neural Information Processing Systems Conference.

algorithm, local switching cost, provably efficient q-learning, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

SMILe: Scalable Meta Inverse Reinforcement Learning through Context-Conditional Policies

Ghasemipour, Seyed Kamyar Seyed, Gu, Shixiang (Shane), Zemel, Richard

Neural Information Processing SystemsMar-18-2020, 23:47:01 GMT

Imitation Learning (IL) has been successfully applied to complex sequential decision-making problems where standard Reinforcement Learning (RL) algorithms fail. A number of recent methods extend IL to few-shot learning scenarios, where a meta-trained policy learns to quickly master new tasks using limited demonstrations. However, although Inverse Reinforcement Learning (IRL) often outperforms Behavioral Cloning (BC) in terms of imitation quality, most of these approaches build on BC due to its simple optimization objective. In this work, we propose SMILe, a scalable framework for Meta Inverse Reinforcement Learning (Meta-IRL) based on maximum entropy IRL, which can learn high-quality policies from few demonstrations. We examine the efficacy of our method on a variety of high-dimensional simulated continuous control tasks and observe that SMILe significantly outperforms Meta-BC.

inverse reinforcement learning, meta inverse reinforcement learning, reinforcement learning, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Unified Bellman Optimality Principle Combining Reward Maximization and Empowerment

Leibfried, Felix, Pascual-Díaz, Sergio, Grau-Moya, Jordi

Neural Information Processing SystemsMar-18-2020, 23:46:58 GMT

Empowerment is an information-theoretic method that can be used to intrinsically motivate learning agents. It attempts to maximize an agent's control over the environment by encouraging visiting states with a large number of reachable next states. Empowered learning has been shown to lead to complex behaviors, without requiring an explicit reward signal. In this paper, we investigate the use of empowerment in the presence of an extrinsic reward signal. We hypothesize that empowerment can guide reinforcement learning (RL) agents to find good early behavioral solutions by encouraging highly empowered states.

combining reward maximization and empowerment, optimality principle combining reward maximization, reward signal, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.82)

Add feedback