Mean-Field Sampling for Cooperative Multi-Agent Reinforcement Learning

Anand, Emile, Karmarkar, Ishani, Qu, Guannan

arXiv.org Artificial Intelligence 

Reinforcement Learning (RL) has become a popular learning framework to solve sequential decision making problems in unknown environments, and has achieved tremendous success in a wide array of domains such as playing the game of Go (Silver et al., 2016), robotic control (Kober et al., 2013), and autonomous driving (Kiran et al., 2022; Lin et al., 2023). A critical feature of most real-world systems is their uncertain nature, and consequently RL has emerged as a powerful tool for learning optimal policies for multi-agent systems to operate in unknown environments (Kim & Giannakis, 2017; Zhang et al., 2021; Lin et al., 2024; Anand & Qu, 2024). While the early literature on RL predominantly focused on the single-agent setting, multi-agent reinforcement learning (MARL) has also recently achieved impressive successes in a broad range of areas, such as coordination of robotic swarms (Preiss et al., 2017), self-driving vehicles (DeWeese & Qu, 2024), real-time bidding (Jin et al., 2018), ride-sharing (Li et al., 2019), and stochastic games (Jin et al., 2020). Despite growing interest in multi-agent RL (MARL), extending RL to multi-agent settings poses significant computational challenges due to the curse of dimensionality (Sayin et al., 2021). Even if the individual agents' state or action spaces are small, the global state space or action space can take values from a set with size that is exponentially large as a function of the number of agents.