Minimax-Optimal Multi-Agent RL in Markov Games With a Generative Model Gen Li UPenn Y uejie Chi CMU Y uting Wei UPenn Y uxin Chen UPenn
–Neural Information Processing Systems
All prior results suffer from at least one of the two obstacles: the curse of multiple agents and the barrier of long horizon, regardless of the sampling protocol in use. We take a step towards settling this problem, assuming access to a flexible sampling mechanism: the generative model. Focusing on non-stationary finite-horizon Markov games, we develop a fast learning algorithm called Q-FTRL and an adaptive sampling scheme that leverage the optimism principle in online adversarial learning (particularly the Follow-the-Regularized-Leader (FTRL) method).
Neural Information Processing Systems
Aug-15-2025, 07:38:12 GMT
- Country:
- Asia > Middle East
- Jordan (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East
- Genre:
- Instructional Material (0.34)
- Research Report (0.46)
- Industry:
- Leisure & Entertainment (0.93)
- Technology: