A Regularized Opponent Model with Maximum Entropy Objective
Tian, Zheng, Wen, Ying, Gong, Zhichen, Punakkath, Faiz, Zou, Shihao, Wang, Jun
–arXiv.org Artificial Intelligence
In a single-agent setting, reinforcement learning (RL) tasks can be cast into an inference problem by introducing a binary random variable o, which stands for the "optimality". In this paper, we redefine the binary random variable o in multi-agent setting and formalize multi-agent reinforcement learning (MARL) as probabilistic inference. We derive a variational lower bound of the likelihood of achieving the optimality and name it as Regularized Opponent Model with Maximum Entropy Objective (ROMMEO). From ROMMEO, we present a novel perspective on opponent modeling and show how it can improve the performance of training agents theoretically and empirically in cooperative games. To optimize ROMMEO, we first introduce a tabular Q-iteration method ROMMEO-Q with proof of convergence. We extend the exact algorithm to complex environments by proposing an approximate version, ROMMEO-AC. We evaluate these two algorithms on the challenging iterated matrix game and differential game respectively and show that they can outperform strong MARL baselines.
arXiv.org Artificial Intelligence
May-17-2019
- Country:
- North America
- Canada > Alberta (0.14)
- United States
- New York > New York County
- New York City (0.04)
- California
- San Francisco County > San Francisco (0.14)
- San Mateo County > Menlo Park (0.04)
- New York > New York County
- North America
- Genre:
- Research Report (0.40)
- Industry:
- Leisure & Entertainment > Games > Computer Games (0.34)
- Technology: