MAVEN: Multi-Agent Variational Exploration

Oct-11-2024, 06:48:07 GMT–Neural Information Processing Systems

Centralised training with decentralised execution is an important setting for cooperative deep multi-agent reinforcement learning due to communication constraints during execution and computational tractability in training. In this paper, we analyse value-based methods that are known to have superior performance in complex environments. We specifically focus on QMIX, the current state-of-the-art in this domain. We show that the representation constraints on the joint action-values introduced by QMIX and similar methods lead to provably poor exploration and suboptimality. Furthermore, we propose a novel approach called MAVEN that hybridises value and policy-based methods by introducing a latent space for hierarchical control.

constraint, maven, multi-agent variational exploration

Neural Information Processing Systems

Oct-11-2024, 06:48:07 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report (0.45)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Agents (0.71)
  - Machine Learning > Reinforcement Learning (0.65)