Scalable spectral representations for multi-agent reinforcement learning in network MDPs

Ren, Zhaolin, Zhang, Runyu, Dai, Bo, Li, Na

arXiv.org Artificial Intelligence 

Multi-agent network systems have found applications in various societal infrastructures, such as power systems, traffic networks, and smart cities [McArthur et al., 2007, Burmeister et al., 1997, Roscia et al., 2013]. One particularly important class of such problems is the cooperative multi-agent network MDP setting, where agents are embedded in a graph, and each agent has its own local state [Qu et al., 2020b]. In network MDPs, the local state transition probabilities and rewards only depend on the states and actions of the agent's direct neighbors in the graph. Such a property has been observed in a great variety of cooperative network control problems, ranging from thermal control of multizone buildings [Zhang et al., 2016], wireless access control [Zocca, 2019] to phase synchronization in electrical grids [Blaabjerg et al., 2006], where agents typically only need to act and learn based on information within a local neighborhood due to constraints on the information and communication infrastructure.