Multi-Agent Trust Region Policy Optimization

Oct-18-2020–arXiv.org Artificial Intelligence

We extend trust region policy optimization (TRPO) to multi-agent reinforcement learning (MARL) problems. We show that the policy update of TRPO can be transformed into a distributed consensus optimization problem for multi-agent cases. By making a series of approximations to the consensus optimization model, we propose a decentralized MARL algorithm, which we call multi-agent TRPO (MATRPO). This algorithm can optimize distributed policies based on local observations and private rewards. The agents do not need to know observations, rewards, policies or value/action-value functions of other agents. The agents only share a likelihood ratio with their neighbors during the training process. The algorithm is fully decentralized and privacy-preserving. Our experiments on two cooperative games demonstrate its robust performance on complicated MARL tasks.

agent, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

Oct-18-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Rhode Island (0.04)
  - New York > New York County
    - New York City (0.04)
  - Massachusetts > Middlesex County
    - Cambridge (0.04)
  - California
    - San Francisco County > San Francisco (0.14)
    - Los Angeles County > Long Beach (0.04)
- Europe
  - France (0.04)
  - Sweden > Stockholm
    - Stockholm (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (1.00)

Industry:
- Information Technology > Security & Privacy (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Agents (1.00)
  - Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found