Decentralized Consensus Inference-based Hierarchical Reinforcement Learning for Multi-Constrained UAV Pursuit-Evasion Game
Yuming, Xiang, Sizhao, Li, Rongpeng, Li, Zhifeng, Zhao, Honggang, Zhang
–arXiv.org Artificial Intelligence
--Multiple quadrotor unmanned aerial vehicle (UA V) systems have garnered widespread research interest and fostered tremendous interesting applications, especially in multi-constrained pursuit-evasion games (MC-PEG). The Cooperative Evasion and Formation Coverage (CEFC) task, where the UA V swarm aims to maximize formation coverage across multiple target zones while collaboratively evading predators, belongs to one of the most challenging issues in MC-PEG, especially under communication-limited constraints. This multifaceted problem, which intertwines responses to obstacles, adversaries, target zones, and formation dynamics, brings up significant high-dimensional complications in locating a solution. In this paper, we propose a novel two-level framework (i.e., Consensus Inference-based Hierarchical Reinforcement Learning (CI-HRL)), which delegates target localization to a high-level policy, while adopting a low-level policy to manage obstacle avoidance, navigation, and formation. Specifically, in the high-level policy, we develop a novel multi-agent reinforcement learning module, Consensus-oriented Multi-Agent Communication (ConsMAC), to enable agents to perceive global information and establish consensus from local states by effectively aggregating neighbor messages. Meanwhile, we leverage an Alternative Training-based Multi-agent proximal policy optimization (A T -M) and policy distillation to accomplish the low-level control. The experimental results, including the high-fidelity software-in-the-loop (SITL) simulations, validate that CI-HRL provides a superior solution with enhanced swarm's collaborative evasion and task completion capabilities. Nowadays, quadrotor Unmanned Aerial V ehicles (UA Vs) have demonstrated great potential in costly or human-unfriendly tasks (e.g., disaster response [1]), due to their agility, cost-effectiveness, and compact size. Nevertheless, the UA V swarm is likely to be exposed to an adversarial environment, where a hostile factor or agent might attack the affiliated members, and must respond promptly to boost the survival opportunity. Y uming Xiang and Sizhao Li and Rongpeng Li are with the College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China (email: {xiangym1999; liszh5; lirongpeng }@zju.edu.cn).
arXiv.org Artificial Intelligence
Jun-24-2025
- Country:
- Asia
- China
- Tianjin Province > Tianjin (0.04)
- Zhejiang Province > Hangzhou (0.24)
- Japan > Honshū
- Tōhoku > Miyagi Prefecture > Sendai (0.04)
- Macao (0.04)
- Malaysia > Kuala Lumpur
- Kuala Lumpur (0.04)
- China
- Europe > Austria
- Vienna (0.14)
- North America
- Canada (0.04)
- Puerto Rico > San Juan
- San Juan (0.04)
- United States > California (0.04)
- Oceania > New Zealand
- North Island > Auckland Region > Auckland (0.04)
- Asia
- Genre:
- Instructional Material > Course Syllabus & Notes (0.34)
- Research Report > Promising Solution (0.34)
- Industry:
- Aerospace & Defense > Aircraft (0.34)
- Information Technology (0.48)