MARL Warehouse Robots
Allman, Price, Thang, Lian, Simmons, Dre, Riaz, Salmon
–arXiv.org Artificial Intelligence
Our research investigates the complex task of multiple autonomous agents learning to coordinate and deliver packages in warehouse environments--a problem requiring implicit communication, collision avoidance, and efficient task allocation without centralized control. Traditional warehouse automation relies on centralized planning systems that face scalability limitations; multi-agent reinforcement learning (MARL) offers an alternative through decentralized learned policies, but requires solving the credit assignment problem. We compare MARL algorithms on warehouse coordination: QMIX [Rashid et al., 2018] (value decomposition), IPPO (independent learning), and MASAC (centralized critic). Our study progresses from MPE for validation to RWARE for warehouse evaluation, culminating in Unity 3D deployment where agents demonstrate learned package delivery behavior. QMIX emerged as the best performer after systematic comparison. Our contributions: (1) hyperparameter analysis showing default configurations fail on sparse-reward warehouse tasks, (2) comparative evaluation across algorithms and scales, (3) Unity ML-Agents integration demonstrating sim-to-sim transfer with successful package delivery, and (4) identification of scaling challenges. Full experimental details and results are documented in our Quarto documentation book. 1
arXiv.org Artificial Intelligence
Dec-10-2025
- Country:
- North America > United States > Oklahoma > Tulsa County > Tulsa (0.05)
- Genre:
- Research Report (0.65)
- Industry:
- Technology: