Differentiable Arbitrating in Zero-sum Markov Games
Wang, Jing, Song, Meichen, Gao, Feng, Liu, Boyi, Wang, Zhaoran, Wu, Yi
–arXiv.org Artificial Intelligence
We initiate the study of how to perturb the reward in a zero-sum Markov game with two players to induce a desirable Nash equilibrium, namely arbitrating. Such a problem admits a bi-level optimization formulation. The lower level requires solving the Nash equilibrium under a given reward function, which makes the overall problem challenging to optimize in an end-to-end way. We propose a backpropagation scheme that differentiates through the Nash equilibrium, which provides the gradient feedback for the upper level. In particular, our method only requires a black-box solver for the (regularized) Nash equilibrium (NE). We develop the convergence analysis for the proposed framework with proper black-box NE solvers and demonstrate the empirical successes in two multi-agent reinforcement learning (MARL) environments.
arXiv.org Artificial Intelligence
Feb-20-2023
- Country:
- Oceania > Australia
- North America > United States
- Illinois (0.04)
- New York
- Suffolk County > Stony Brook (0.04)
- New York County > New York City (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Europe
- United Kingdom > England
- Greater London > London (0.04)
- Cambridgeshire > Cambridge (0.04)
- France > Hauts-de-France
- United Kingdom > England
- Asia
- Genre:
- Research Report (0.63)
- Industry:
- Leisure & Entertainment > Games (0.68)
- Technology: