Regret Lower Bounds for Decentralized Multi-Agent Stochastic Shortest Path Problems
–Neural Information Processing Systems
Multi-agent systems (MAS) are central to applications such as swarm robotics and traffic routing, where agents must coordinate in a decentralized manner to achieve a common objective. Stochastic Shortest Path (SSP) problems provide a natural framework for modeling decentralized control in such settings. While the problem of learning in SSP has been extensively studied in single-agent settings, the decentralized multi-agent variant remains largely unexplored. In this work, we take a step towards addressing that gap. We study decentralized multi-agent SSPs (Dec-MASSPs) under linear function approximation, where the transition dynamics and costs are represented using linear models. Applying novel symmetry-based arguments, we identify the structure of optimal policies. Our main contribution is the first regret lower bound for this setting based on the construction of hard-tolearn instances for any number of agents, n. Our regret lower bound of Ω( K), over K episodes, highlights the inherent learning difficulty in Dec-MASSPs. These insights clarify the learning complexity of decentralized control and can further guide the design of efficient learning algorithms in multi-agent systems.
Neural Information Processing Systems
Jun-23-2026, 04:21:32 GMT
- Genre:
- Research Report > Experimental Study (0.92)
- Industry:
- Information Technology (0.45)
- Transportation (0.45)
- Technology: