Agent-Time Attention for Sparse Rewards Multi-Agent Reinforcement Learning
She, Jennifer, Gupta, Jayesh K., Kochenderfer, Mykel J.
–arXiv.org Artificial Intelligence
Cooperative multi-agent reinforcement learning (MARL) where a team of agents learn coordinated policies optimizing global team rewards has been extensively studied in recent years [25, 13], and find potential applications in a wide variety of domains like robot swarm control [15, 2], coordinating autonomous drivers [26, 41], network routing [38, 4], etc. Although cooperative MARL problems can be framed as a centralized single-agent, with the team as that actor with the joint action space, such an approach doesn't scale well. Joint action space grows exponentially with number of agents in such scenarios. Moreover, due to real world constraints on communication and observability, such framing is often not useful for a large number of real world applications. Unfortunately, simply independently learning decentralized policies based on local observations result into unstable learning and convergence issues due to non-stationarity from simultaneous exploration [12, 33]. This has resulted in MARL methods focusing on the centralized training decentralized execution (CTDE) paradigm, where during training decentralized polices can have access to extra state information during training but not during evaluation.
arXiv.org Artificial Intelligence
Oct-31-2022
- Country:
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Genre:
- Research Report (0.50)
- Technology: