state-action value function
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.99)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
- North America > Canada > Alberta (0.14)
- Asia > Middle East > Jordan (0.04)
- Europe > Netherlands > South Holland > Delft (0.04)
- (4 more...)
ResQ: A Residual Q Function-based Approach for Multi-Agent Reinforcement Learning Value Factorization
The factorization of state-action value functions for Multi-Agent Reinforcement Learning (MARL) is important. Existing studies are limited by their representation capability, sample efficiency, and approximation error. To address these challenges, we propose, ResQ, a MARL value function factorization method, which can find the optimal joint policy for any state-action value function through residual functions. ResQ masks some state-action value pairs from a joint state-action value function, which is transformed as the sum of a main function and a residual function. ResQ can be used with mean-value and stochastic-value RL. We theoretically show that ResQ can satisfy both the individual global max (IGM) and the distributional IGM principle without representation limitations. Through experiments on matrix games, the predator-prey, and StarCraft benchmarks, we show that ResQ can obtain better results than multiple expected/stochastic value factorization methods.
- North America > Canada > Alberta (0.14)
- Asia > Middle East > Jordan (0.04)
- Europe > Netherlands > South Holland > Delft (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.99)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
InterQ: A DQN Framework for Optimal Intermittent Control
Aggarwal, Shubham, Maity, Dipankar, Başar, Tamer
In this letter, we explore the communication-control co-design of discrete-time stochastic linear systems through reinforcement learning. Specifically, we examine a closed-loop system involving two sequential decision-makers: a scheduler and a controller. The scheduler continuously monitors the system's state but transmits it to the controller intermittently to balance the communication cost and control performance. The controller, in turn, determines the control input based on the intermittently received information. Given the partially nested information structure, we show that the optimal control policy follows a certainty-equivalence form. Subsequently, we analyze the qualitative behavior of the scheduling policy. To develop the optimal scheduling policy, we propose InterQ, a deep reinforcement learning algorithm which uses a deep neural network to approximate the Q-function. Through extensive numerical evaluations, we analyze the scheduling landscape and further compare our approach against two baseline strategies: (a) a multi-period periodic scheduling policy, and (b) an event-triggered policy. The results demonstrate that our proposed method outperforms both baselines. The open source implementation can be found at https://github.com/AC-sh/InterQ.
- North America > United States > Illinois > Champaign County > Urbana (0.14)
- North America > United States > North Carolina > Mecklenburg County > Charlotte (0.04)
ResQ: A Residual Q Function-based Approach for Multi-Agent Reinforcement Learning Value Factorization
The factorization of state-action value functions for Multi-Agent Reinforcement Learning (MARL) is important. Existing studies are limited by their representation capability, sample efficiency, and approximation error. To address these challenges, we propose, ResQ, a MARL value function factorization method, which can find the optimal joint policy for any state-action value function through residual functions. ResQ masks some state-action value pairs from a joint state-action value function, which is transformed as the sum of a main function and a residual function. ResQ can be used with mean-value and stochastic-value RL.
Deterministic Trajectory Optimization through Probabilistic Optimal Control
Filabadi, Mohammad Mahmoudi, Lefebvre, Tom, Crevecoeur, Guillaume
This article proposes two new algorithms tailored to discrete-time deterministic finite-horizon nonlinear optimal control problems or so-called trajectory optimization problems. Both algorithms are inspired by a novel theoretical paradigm known as probabilistic optimal control, that reformulates optimal control as an equivalent probabilistic inference problem. This perspective allows to address the problem using the Expectation-Maximization algorithm. We show that the application of this algorithm results in a fixed point iteration of probabilistic policies that converge to the deterministic optimal policy. Two strategies for policy evaluation are discussed, using state-of-the-art uncertainty quantification methods resulting into two distinct algorithms. The algorithms are structurally closest related to the differential dynamic programming algorithm and related methods that use sigma-point methods to avoid direct gradient evaluations. The main advantage of our work is an improved balance between exploration and exploitation over the iterations, leading to improved numerical stability and accelerated convergence. These properties are demonstrated on different nonlinear systems.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- (2 more...)
- Energy (0.34)
- Health & Medicine (0.34)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
A Distributed Approach to Autonomous Intersection Management via Multi-Agent Reinforcement Learning
Cederle, Matteo, Fabris, Marco, Susto, Gian Antonio
Autonomous intersection management (AIM) poses significant challenges due to the intricate nature of real-world traffic scenarios and the need for a highly expensive centralised server in charge of simultaneously controlling all the vehicles. This study addresses such issues by proposing a novel distributed approach to AIM utilizing multi-agent reinforcement learning (MARL). We show that by leveraging the 3D surround view technology for advanced assistance systems, autonomous vehicles can accurately navigate intersection scenarios without needing any centralised controller. The contributions of this paper thus include a MARL-based algorithm for the autonomous management of a 4-way intersection and also the introduction of a new strategy called prioritised scenario replay for improved training efficacy. We validate our approach as an innovative alternative to conventional centralised AIM techniques, ensuring the full reproducibility of our results. Specifically, experiments conducted in virtual environments using the SMARTS platform highlight its superiority over benchmarks across various metrics.
- Europe > Italy (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Kernelized Reinforcement Learning with Order Optimal Regret Bounds
Vakili, Sattar, Olkhovskaya, Julia
Reinforcement learning (RL) has shown empirical success in various real world settings with complex models and large state-action spaces. The existing analytical results, however, typically focus on settings with a small number of state-actions or simple models such as linearly modeled state-action value functions. To derive RL policies that efficiently handle large state-action spaces with more general value functions, some recent works have considered nonlinear function approximation using kernel ridge regression. We propose $\pi$-KRVI, an optimistic modification of least-squares value iteration, when the state-action value function is represented by a reproducing kernel Hilbert space (RKHS). We prove the first order-optimal regret guarantees under a general setting. Our results show a significant polynomial in the number of episodes improvement over the state of the art. In particular, with highly non-smooth kernels (such as Neural Tangent kernel or some Mat\'ern kernels) the existing results lead to trivial (superlinear in the number of episodes) regret bounds. We show a sublinear regret bound that is order optimal in the case of Mat\'ern kernels where a lower bound on regret is known.
- North America > Canada > Alberta (0.14)
- Asia > Middle East > Jordan (0.04)
- Europe > Netherlands > South Holland > Delft (0.04)
- (4 more...)