Beyers, Louise
Performant, Memory Efficient and Scalable Multi-Agent Reinforcement Learning
Mahjoub, Omayma, Abramowitz, Sasha, de Kock, Ruan, Khlifi, Wiem, Toit, Simon du, Daniel, Jemma, Nessir, Louay Ben, Beyers, Louise, Formanek, Claude, Clark, Liam, Pretorius, Arnu
As the field of multi-agent reinforcement learning (MARL) progresses towards larger and more complex environments, achieving strong performance while maintaining memory efficiency and scalability to many agents becomes increasingly important. Although recent research has led to several advanced algorithms, to date, none fully address all of these key properties simultaneously. In this work, we introduce Sable, a novel and theoretically sound algorithm that adapts the retention mechanism from Retentive Networks to MARL. Sable's retention-based sequence modelling architecture allows for computationally efficient scaling to a large number of agents, as well as maintaining a long temporal context, making it well-suited for large-scale partially observable environments. Through extensive evaluations across six diverse environments, we demonstrate how Sable is able to significantly outperform existing state-of-the-art methods in the majority of tasks (34 out of 45, roughly 75%). Furthermore, Sable demonstrates stable performance as we scale the number of agents, handling environments with more than a thousand agents while exhibiting a linear increase in memory usage. Finally, we conduct ablation studies to isolate the source of Sable's performance gains and confirm its efficient computational memory usage. Our results highlight Sable's performance and efficiency, positioning it as a leading approach to MARL at scale. When considering large-scale practical applications of multi-agent reinforcement learning (MARL) such as autonomous driving (Lian & Deshmukh, 2006; Zhou et al., 2021; Li et al., 2022) and electricity grid control (Kamboj et al., 2011; Li et al., 2016), it becomes increasingly important to maintain three key properties for a system to be effective: strong performance, memory efficiency, and scalability to many agents. Although many existing MARL approaches exhibit one or two of these properties, a solution effectively encompassing all three remains elusive. To briefly illustrate our point, we consider the spectrum of approaches to MARL. Such algorithms demonstrate proficiency in handling many agents in a memory efficient way by typically using shared parameters and conditioning on an agent identifier. However, at scale, the performance of fully decentralised methods remains suboptimal compared to more centralised approaches (Papoudakis et al., 2021; Yu et al., 2022; Wen et al., 2022). Between decentralised and centralised methods, lie CTDE approaches (Lowe et al., 2017; Papoudakis et al., 2021; Yu et al., 2022).
Coordination Failure in Cooperative Offline MARL
Tilbury, Callum Rhys, Formanek, Claude, Beyers, Louise, Shock, Jonathan P., Pretorius, Arnu
Offline multi-agent reinforcement learning (MARL) leverages static datasets of experience to learn optimal multi-agent control. However, learning from static data presents several unique challenges to overcome. In this paper, we focus on coordination failure and investigate the role of joint actions in multi-agent policy gradients with offline data, focusing on a common setting we refer to as the 'Best Response Under Data' (BRUD) approach. By using two-player polynomial games as an analytical tool, we demonstrate a simple yet overlooked failure mode of BRUD-based algorithms, which can lead to catastrophic coordination failure in the offline setting. Building on these insights, we propose an approach to mitigate such failure, by prioritising samples from the dataset based on joint-action similarity during policy learning and demonstrate its effectiveness in detailed experiments. More generally, however, we argue that prioritised dataset sampling is a promising area for innovation in offline MARL that can be combined with other effective approaches such as critic and policy regularisation. Importantly, our work shows how insights drawn from simplified, tractable games can lead to useful, theoretically grounded insights that transfer to more complex contexts. A core dimension of offering is an interactive notebook, from which almost all of our results can be reproduced, in a browser.
Dispelling the Mirage of Progress in Offline MARL through Standardised Baselines and Evaluation
Formanek, Claude, Tilbury, Callum Rhys, Beyers, Louise, Shock, Jonathan, Pretorius, Arnu
Offline multi-agent reinforcement learning (MARL) is an emerging field with great promise for real-world applications. Unfortunately, the current state of research in offline MARL is plagued by inconsistencies in baselines and evaluation protocols, which ultimately makes it difficult to accurately assess progress, trust newly proposed innovations, and allow researchers to easily build upon prior work. In this paper, we firstly identify significant shortcomings in existing methodologies for measuring the performance of novel algorithms through a representative study of published offline MARL work. Secondly, by directly comparing to this prior work, we demonstrate that simple, well-implemented baselines can achieve state-of-the-art (SOTA) results across a wide range of tasks. Specifically, we show that on 35 out of 47 datasets used in prior work (almost 75% of cases), we match or surpass the performance of the current purported SOTA. Strikingly, our baselines often substantially outperform these more sophisticated algorithms. Finally, we correct for the shortcomings highlighted from this prior work by introducing a straightforward standardised methodology for evaluation and by providing our baseline implementations with statistically robust results across several scenarios, useful for comparisons in future work. Our proposal includes simple and sensible steps that are easy to adopt, which in combination with solid baselines and comparative results, could substantially improve the overall rigour of empirical science in offline MARL moving forward.