Deploying Ten Thousand Robots: Scalable Imitation Learning for Lifelong Multi-Agent Path Finding

Jiang, He, Wang, Yutong, Veerapaneni, Rishi, Duhan, Tanishq, Sartoretti, Guillaume, Li, Jiaoyang

arXiv.org Artificial Intelligence 

Abstract-- Lifelong Multi-Agent Path Finding (LMAPF) is a variant of MAPF where agents are continually assigned new goals, necessitating frequent re-planning to accommodate these dynamic changes. Recently, this field has embraced learning-based methods, which reactively generate single-step actions based on individual local observations. However, it is still challenging for them to match the performance of the best search-based algorithms, especially in large-scale settings. This work proposes an imitation-learning-based LMAPF solver that introduces a novel communication module and systematic single-step collision resolution and global guidance techniques. Details are given in Table III. However, most learning-based solvers have only been tested on small-scale instances involving tens I. Multi-Agent Path Finding (MAPF) [1] is the problem of Additionally, most learning papers emphasize the scalability finding collision-free paths on a given graph for a set of of their solvers compared to optimal or boundedsuboptimal agents, each assigned a start and goal location. This is largely because these search-based new goals to agents that reach their current ones. The main solvers struggle with computational complexity, as solving target of LMAPF is to maximize the throughput, which MAPF optimally is NP-hard.