Goto

Collaborating Authors

 ma2c


Traffic Signal Control with Communicative Deep Reinforcement Learning Agents: a Case Study

Fazzini, Paolo, Wheeler, Isaac, Petracchini, Francesco

arXiv.org Artificial Intelligence

In this work we theoretically and experimentally analyze Multi-Agent Advantage Actor-Critic (MA2C) and Independent Advantage Actor-Critic (IA2C), two recently proposed multi-agent reinforcement learning methods that can be applied to control traffic signals in urban areas. The two methods differ in their use of a reward calculated locally or globally and in the management of agents' communication. We analyze the methods theoretically with the framework provided by non-Markov decision processes, which provides useful insights in the analysis of the algorithms. Moreover, we analyze the efficacy and the robustness of the methods experimentally by testing them in two traffic areas in the Bologna (Italy) area, simulated by SUMO, a software tool. The experimental results indicate that MA2C achieves the best performance in the majority of cases, outperforms the alternative method considered, and displays sufficient stability during the learning process.


Continuous Multiagent Control using Collective Behavior Entropy for Large-Scale Home Energy Management

Sun, Jianwen, Zheng, Yan, Hao, Jianye, Meng, Zhaopeng, Liu, Yang

arXiv.org Artificial Intelligence

With the increasing popularity of electric vehicles, distributed energy generation and storage facilities in smart grid systems, an efficient Demand-Side Management (DSM) is urgent for energy savings and peak loads reduction. Traditional DSM works focusing on optimizing the energy activities for a single household can not scale up to large-scale home energy management problems. Multi-agent Deep Reinforcement Learning (MA-DRL) shows a potential way to solve the problem of scalability, where modern homes interact together to reduce energy consumers consumption while striking a balance between energy cost and peak loads reduction. However, it is difficult to solve such an environment with the non-stationarity, and existing MA-DRL approaches cannot effectively give incentives for expected group behavior. In this paper, we propose a collective MA-DRL algorithm with continuous action space to provide fine-grained control on a large scale microgrid. To mitigate the non-stationarity of the microgrid environment, a novel predictive model is proposed to measure the collective market behavior. Besides, a collective behavior entropy is introduced to reduce the high peak loads incurred by the collective behaviors of all householders in the smart grid. Empirical results show that our approach significantly outperforms the state-of-the-art methods regarding power cost reduction and daily peak loads optimization.


Multi-Agent Deep Reinforcement Learning for Large-scale Traffic Signal Control

Chu, Tianshu, Wang, Jie, Codecà, Lara, Li, Zhaojian

arXiv.org Machine Learning

Reinforcement learning (RL) is a promising data-driven approach for adaptive traffic signal control (ATSC) in complex urban traffic networks, and deep neural networks further enhance its learning power. However, centralized RL is infeasible for large-scale ATSC due to the extremely high dimension of the joint action space. Multi-agent RL (MARL) overcomes the scalability issue by distributing the global control to each local RL agent, but it introduces new challenges: now the environment becomes partially observable from the viewpoint of each local agent due to limited communication among agents. Most existing studies in MARL focus on designing efficient communication and coordination among traditional Q-learning agents. This paper presents, for the first time, a fully scalable and decentralized MARL algorithm for the state-of-the-art deep RL agent: advantage actor critic (A2C), within the context of ATSC. In particular, two methods are proposed to stabilize the learning procedure, by improving the observability and reducing the learning difficulty of each local agent. The proposed multi-agent A2C is compared against independent A2C and independent Q-learning algorithms, in both a large synthetic traffic grid and a large real-world traffic network of Monaco city, under simulated peak-hour traffic dynamics. Results demonstrate its optimality, robustness, and sample efficiency over other state-of-the-art decentralized MARL algorithms.