Minimax-Optimal Multi-Agent RL in Markov Games With a Generative Model

Oct-11-2024, 07:57:47 GMT–Neural Information Processing Systems

All prior results suffer from at least one of the two obstacles: the curse of multiple agents and the barrier of long horizon, regardless of the sampling protocol in use. We take a step towards settling this problem, assuming access to a flexible sampling mechanism: the generative model. Focusing on non-stationary finite-horizon Markov games, we develop a fast learning algorithm called Q-FTRL and an adaptive sampling scheme that leverage the optimism principle in online adversarial learning (particularly the Follow-the-Regularized-Leader (FTRL) method). Our algorithm learns an \varepsilon -approximate CCE in a general-sum Markov game using \widetilde{O}\bigg( \frac{H 4 S \sum_{i 1} m A_i}{\varepsilon 2} \bigg) samples, where m is the number of players, S indicates the number of states, H is the horizon, and A_i denotes the number of actions for the i -th player. This is minimax-optimal (up to log factor) when m is fixed.

generative model, markov game, minimax-optimal multi-agent rl, (3 more...)

Neural Information Processing Systems

Oct-11-2024, 07:57:47 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Generation (0.64)
  - Machine Learning > Reinforcement Learning (0.59)
  - Representation & Reasoning
    - Agents (0.77)
    - Search (0.64)