Minimax-Optimal Multi-Agent RL in Markov Games With a Generative Model

Mar-23-2025, 04:50:37 GMT–Neural Information Processing Systems

All prior results suffer from at least one of the two obstacles: the curse of multiple agents and the barrier of long horizon, regardless of the sampling protocol in use. We take a step towards settling this problem, assuming access to a flexible sampling mechanism: the generative model. Focusing on non-stationary finite-horizon Markov games, we develop a fast learning algorithm called Q-FTRL and an adaptive sampling scheme that leverage the optimism principle in online adversarial learning (particularly the Follow-the-Regularized-Leader (FTRL) method).

artificial intelligence, machine learning, reinforcement learning, (11 more...)

Neural Information Processing Systems

Mar-23-2025, 04:50:37 GMT

Conferences PDF

Add feedback

Genre:
- Instructional Material (0.34)
- Research Report (0.46)

Industry:
- Leisure & Entertainment (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (1.00)
  - Representation & Reasoning > Agents (1.00)