Mastering Multi-Drone Volleyball through Hierarchical Co-Self-Play Reinforcement Learning

Zhang, Ruize, Xiang, Sirui, Xu, Zelai, Gao, Feng, Ji, Shilong, Tang, Wenhao, Ding, Wenbo, Yu, Chao, Wang, Yu

arXiv.org Artificial Intelligence 

Competitive tasks have long served as benchmarks for progress in artificial intelligence. Landmark results have been achieved in domains such as Go [1], poker [2], and real-time strategy games [3], where agents learn to plan, adapt, and compete under structured rules. As research moves from virtual environments to the physical world, robot sports-structured, rule-based competitions involving physical agents-have emerged as a promising frontier for embodied intelligence. Examples include robot soccer [4, 5], table tennis [6, 7], and multi-drone pursuit-evasion [8], which combine high-level strategy with low-level motion control in physically grounded settings. In this paper, we tackle a new embodied competitive task proposed by the V olleyBots testbed [9]: 3v3 multi-drone volleyball. This task exemplifies the structure of a robot sport-well-defined objectives, explicit rules, and head-to-head competition-while presenting a set of unique and underex-plored challenges. Each team must coordinate three quadrotors to rally a ball over a net, switching roles dynamically between offense and defense in a turn-based fashion. The environment is highly dynamic and demands precise timing, agile 3D maneuvering, and strategic team-level behavior. The turn-based nature of ball exchange introduces long-horizon temporal dependencies; the multi-agent setting requires tightly coupled tactics; and the underactuated dynamics of quadrotors call for fine-grained, reactive motor skills.