Policy Evaluation and Seeking for Multi-Agent Reinforcement Learning via Best Response

Yan, Rui, Duan, Xiaoming, Shi, Zongying, Zhong, Yisheng, Marden, Jason R., Bullo, Francesco

Jun-20-2020–arXiv.org Machine Learning

This paper introduces two metrics (cycle-based and memory-based metrics), grounded on a dynamical game-theoretic solution concept called sink equilibrium, for the evaluation, ranking, and computation of policies in multi-agent learning. We adopt strict best response dynamics (SBRD) to model selfish behaviors at a meta-level for multi-agent reinforcement learning. Our approach can deal with dynamical cyclical behaviors (unlike approaches based on Nash equilibria and Elo ratings), and is more compatible with single-agent reinforcement learning than alpha-rank which relies on weakly better responses. We first consider settings where the difference between largest and second largest underlying metric has a known lower bound. With this knowledge we propose a class of perturbed SBRD with the following property: only policies with maximum metric are observed with nonzero probability for a broad class of stochastic games with finite memory. We then consider settings where the lower bound for the difference is unknown. For this setting, we propose a class of perturbed SBRD such that the metrics of the policies observed with nonzero probability differ from the optimal by any given tolerance. The proposed perturbed SBRD addresses the opponent-induced non-stationarity by fixing the strategies of others for the learning agent, and uses empirical game-theoretic analysis to estimate payoffs for each strategy profile obtained due to the perturbation.

agent, joint strategy, sink equilibrium, (14 more...)

arXiv.org Machine Learning

Jun-20-2020

arXiv.org PDF

Add feedback

Country:
- South America > Brazil
  - São Paulo (0.04)
- North America
  - United States
    - Massachusetts > Hampshire County
      - Amherst (0.04)
    - California
      - Santa Barbara County > Santa Barbara (0.04)
      - San Francisco County > San Francisco (0.04)
  - Canada
    - British Columbia
      - Metro Vancouver Regional District > Vancouver (0.04)
      - Vancouver Island > Capital Regional District
        Victoria (0.04)
    - Alberta > Census Division No. 11
      - Edmonton Metropolitan Region > Edmonton (0.04)
- Europe
  - Spain (0.04)
  - United Kingdom > England
    - Oxfordshire > Oxford (0.04)
  - Sweden > Stockholm
    - Stockholm (0.04)
- Asia > China
  - Beijing > Beijing (0.04)

Genre:
- Research Report (0.40)

Industry:
- Leisure & Entertainment > Games (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Agents (1.00)
  - Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found