Differentiable Arbitrating in Zero-sum Markov Games

Wang, Jing, Song, Meichen, Gao, Feng, Liu, Boyi, Wang, Zhaoran, Wu, Yi

Feb-20-2023–arXiv.org Artificial Intelligence

We initiate the study of how to perturb the reward in a zero-sum Markov game with two players to induce a desirable Nash equilibrium, namely arbitrating. Such a problem admits a bi-level optimization formulation. The lower level requires solving the Nash equilibrium under a given reward function, which makes the overall problem challenging to optimize in an end-to-end way. We propose a backpropagation scheme that differentiates through the Nash equilibrium, which provides the gradient feedback for the upper level. In particular, our method only requires a black-box solver for the (regularized) Nash equilibrium (NE). We develop the convergence analysis for the proposed framework with proper black-box NE solvers and demonstrate the empirical successes in two multi-agent reinforcement learning (MARL) environments.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

Feb-20-2023

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Victoria > Melbourne (0.04)
- North America > United States
  - Illinois (0.04)
  - New York
    - Suffolk County > Stony Brook (0.04)
    - New York County > New York City (0.04)
  - Massachusetts > Middlesex County
    - Cambridge (0.04)
- Europe
  - United Kingdom > England
    - Greater London > London (0.04)
    - Cambridgeshire > Cambridge (0.04)
  - France > Hauts-de-France
    - Nord > Lille (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - China
    - Shanghai > Shanghai (0.04)
    - Beijing > Beijing (0.04)

Genre:
- Research Report (0.63)

Industry:
- Leisure & Entertainment > Games (0.68)

Technology:
- Information Technology
  - Game Theory (1.00)
  - Artificial Intelligence
    - Representation & Reasoning > Agents (1.00)
    - Machine Learning
      - Reinforcement Learning (1.00)
      - Neural Networks (1.00)
      - Statistical Learning > Gradient Descent (0.46)
      - Learning Graphical Models > Undirected Networks
        Markov Models (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found