Minimax-Optimal Multi-Agent Robust Reinforcement Learning

Jiao, Yuchen, Li, Gen

arXiv.org Artificial Intelligence 

The rapidly evolving field of multi-agent reinforcement learning (MARL), also referred to as Markov games (MGs) (Littman, 1994; Shapley, 1953), explores how a group of agents interacts in a shared, dynamic environment to maximize their individual expected cumulative rewards (Zhang et al., 2020a; Lanctot et al., 2019; Silver et al., 2017; Vinyals et al., 2019). This area has found wide applications in fields such as ecosystem management (Fang et al., 2015), strategic decision-making in board games (Silver et al., 2017), management science (Saloner, 1991), and autonomous driving (Zhou et al., 2020). However, in real-world applications, environmental uncertainties--stemming from factors such as system noise, model misalignment, and the sim-to-real gap--can significantly alter both the qualitative outcomes of the game and the cumulatiev rewards that agents receive (Slumbers et al., 2023). It has been demonstrated that when solutions learned in a simulated environment are applied, even a small deviation in the deployed environment from the expected model can result in catastrophic performance drops for one or more agents (Shi et al., 2024c; Balaji et al., 2019; Yeh et al., 2021; Zeng et al., 2022; Zhang et al., 2020b). These challenges motivate the study of robust Markov games (RMGs), which assume that each agent aims to maximize its worst-case cumulative reward in an environment where the transition model is constrained by an uncertainty set centered around an unknown nominal model. Given the competitive nature of the game, the objective of RMGs is to reach an equilibrium where no agent has an incentive to unilaterally change its policy to increase its own payoff. A classical type of equilibrium is the robust Nash equilibrium (NE) (Nash Jr, 1950), where each agent's policy is independent, and no agent can improve its worst-case performance by deviating from its current strategy. Due to the high computational cost of solving robust NEs, especially in games with more than two agents, this concept is often relaxed to the robust coarse correlated equilibrium (CCE), where agents' policies may be correlated (Moulin & Vial, 1978). In the context of RMGs, achieving equilibrium with minimal samples is of particular interest, as data is often limited in practical applications.