DACOM: Learning Delay-Aware Communication for Multi-Agent Reinforcement Learning

Yuan, Tingting, Chung, Hwei-Ming, Yuan, Jie, Fu, Xiaoming

arXiv.org Artificial Intelligence 

Secondly, the communication improves its policy iteratively by learning from observations delay can interfere with the cooperation between agents to achieve a given goal. RL, with a single agent to decide by introducing delays in action-making (Chen et al. 2021) the behavior of all entities, faces various challenges, such and uncertainty on the arrival time of information. Previous as scalability (Yan et al. 2021) and privacy issues (Yuan, work (Kim et al. 2019) prevents endless waiting by setting Chung, and Fu 2022). To this end, the extension from singleagent a predefined and constant bound for the waiting time, but it RL to multi-agent RL (MARL) (Hernandez-Leal, Kartal, may restrain potential cooperation if it is set too short and and Taylor 2019) is favorable. MARL (Hernandez-Leal, conversely may cause meaningless waiting. Therefore, such Kartal, and Taylor 2019) has been widely used in various a constant timer is inflexible and cannot be adapted to the tasks, such as real-time resource allocation (Yuan et al. dynamics in the communication networks.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found