Multiagent Soft Q-Learning
Wei, Ermo, Wicke, Drew, Freelan, David, Luke, Sean
–arXiv.org Artificial Intelligence
Policy gradient methods are often applied to reinforcement learning in continuous multiagent games. These methods perform local search in the joint-action space, and as we show, they are susceptable to a game-theoretic pathology known as relative overgeneralization. To resolve this issue, we propose Multiagent Soft Q-learning, which can be seen as the analogue of applying Q-learning to continuous controls. We compare our method to MADDPG, a state-of-the-art approach, and show that our method achieves better coordination in multiagent cooperative tasks, converging to better local optima in the joint action space.
arXiv.org Artificial Intelligence
Apr-25-2018
- Country:
- North America > United States (0.46)
- Genre:
- Research Report > Promising Solution (0.34)
- Technology: