PIC: Permutation Invariant Critic for Multi-Agent Deep Reinforcement Learning
Liu, Iou-Jen, Yeh, Raymond A., Schwing, Alexander G.
Single-agent deep reinforcement learning has achieved impressive performance in many domains, including playing Go [1, 2] and Atari games [3, 4]. However, many real world problems, such as traffic congestion reduction [5, 6], antenna tilt control [7], and dynamic resource allocation [8] are more naturally modeled as multi-agent systems. Unfortunately, directly deploying single-agent reinforcement learning to each agent in a multi-agent system does not result in satisfying performance [9, 10]. Particularly, in multi-agent reinforcement learning [8, 10-19], estimating the value function is challenging, because the environment is non-stationary from the perspective of an individual agent [10, 11]. To alleviate the issue, recently, multi-agent deep deterministic policy gradient (MADDPG) [10] proposed a centralized critic whose input is the concatenation of all agents' observations and actions.
Oct-31-2019
- Country:
- North America > United States
- Illinois > Champaign County > Champaign (0.04)
- Asia > Japan
- Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
- North America > United States
- Genre:
- Research Report
- New Finding (0.46)
- Experimental Study (0.30)
- Research Report
- Industry:
- Leisure & Entertainment > Games > Computer Games (0.68)
- Technology: