A Regularized Opponent Model with Maximum Entropy Objective

Tian, Zheng, Wen, Ying, Gong, Zhichen, Punakkath, Faiz, Zou, Shihao, Wang, Jun

May-17-2019–arXiv.org Artificial Intelligence

In a single-agent setting, reinforcement learning (RL) tasks can be cast into an inference problem by introducing a binary random variable o, which stands for the "optimality". In this paper, we redefine the binary random variable o in multi-agent setting and formalize multi-agent reinforcement learning (MARL) as probabilistic inference. We derive a variational lower bound of the likelihood of achieving the optimality and name it as Regularized Opponent Model with Maximum Entropy Objective (ROMMEO). From ROMMEO, we present a novel perspective on opponent modeling and show how it can improve the performance of training agents theoretically and empirically in cooperative games. To optimize ROMMEO, we first introduce a tabular Q-iteration method ROMMEO-Q with proof of convergence. We extend the exact algorithm to complex environments by proposing an approximate version, ROMMEO-AC. We evaluate these two algorithms on the challenging iterated matrix game and differential game respectively and show that they can outperform strong MARL baselines.

artificial intelligence, computer game, opponent model, (18 more...)

arXiv.org Artificial Intelligence

May-17-2019

arXiv.org PDF

Add feedback

Country:
- North America > United States > California > San Francisco County > San Francisco (0.14)

Genre:
- Research Report (0.40)

Industry:
- Leisure & Entertainment > Games > Computer Games (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Statistical Learning
      - Gradient Descent (0.46)
      - Maximum Entropy (0.61)
  - Representation & Reasoning > Agents (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found