Minimax-Optimal Multi-Agent RL in Markov Games With a Generative Model
–Neural Information Processing Systems
All prior results suffer from at least one of the two obstacles: the curse of multiple agents and the barrier of long horizon, regardless of the sampling protocol in use. We take a step towards settling this problem, assuming access to a flexible sampling mechanism: the generative model. Focusing on non-stationary finite-horizon Markov games, we develop a fast learning algorithm called Q-FTRL and an adaptive sampling scheme that leverage the optimism principle in online adversarial learning (particularly the Follow-the-Regularized-Leader (FTRL) method).
Neural Information Processing Systems
Mar-23-2025, 04:50:37 GMT
- Genre:
- Instructional Material (0.34)
- Research Report (0.46)
- Industry:
- Leisure & Entertainment (0.93)
- Technology: