Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning

May-27-2025, 07:47:06 GMT–Neural Information Processing Systems

We present the first study on provably efficient randomized exploration in cooperative multi-agent reinforcement learning (MARL). We propose a unified algorithm framework for randomized exploration in parallel Markov Decision Processes (MDPs), and two Thompson Sampling (TS)-type algorithms, CoopTS-PHE and CoopTS-LMC, incorporating the perturbed-history exploration (PHE) strategy and the Langevin Monte Carlo exploration (LMC) strategy respectively, which are flexible in design and easy to implement in practice. For a special class of parallel MDPs where the transition is (approximately) linear, we theoretically prove that both CoopTS-PHE and CoopTS-LMC achieve a \widetilde{\mathcal{O}}(d {3/2}H 2\sqrt{MK}) regret bound with communication complexity \widetilde{\mathcal{O}}(dHM 2), where d is the feature dimension, H is the horizon length, M is the number of agents, and K is the number of episodes. This is the first theoretical result for randomized exploration in cooperative MARL. We evaluate our proposed method on multiple parallel RL environments, including a deep exploration problem (i.e., N -chain), a video game, and a real-world problem in energy systems.

cooperative multi-agent reinforcement learning, exploration, randomized exploration, (4 more...)

Neural Information Processing Systems

May-27-2025, 07:47:06 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (1.00)
  - Representation & Reasoning > Agents
    - Agent Societies (0.98)