HEADER: Hierarchical Robot Exploration via Attention-Based Deep Reinforcement Learning with Expert-Guided Reward

Cao, Yuhong, Wang, Yizhuo, Liang, Jingsong, Liao, Shuhao, Zhang, Yifeng, Li, Peizhuo, Sartoretti, Guillaume

arXiv.org Artificial Intelligence 

Abstract--This work pushes the boundaries of learning-based methods in autonomous robot exploration in terms of environmental scale and exploration efficiency. HEADER follows existing conventional methods to construct hierarchical representations for the robot belief/map, but further designs a novel community-based algorithm to construct and update a global graph, which remains fully incremental, shape-adaptive, and operates with linear complexity. Building upon attention-based networks, our planner finely reasons about the nearby belief within the local range while coarsely leveraging distant information at the global scale, enabling next-best-viewpoint decisions that consider multi-scale spatial dependencies. Beyond novel map representation, we introduce a parameter-free privileged reward that significantly improves model performance and produces near-optimal exploration behaviors, by avoiding training objective bias caused by handcrafted reward shaping. In simulated challenging, large-scale exploration scenarios, HEADER demonstrates better scalability than most existing learning and non-learning methods, while achieving a significant improvement in exploration efficiency (up to 20%) over state-of-the-art baselines. N autonomous exploration, a mobile robot is tasked with exploring and mapping an unknown environment as fast as possible. By planning and executing its exploration path, the robot classifies unknown areas into free or obstacle areas based on its accumulated sensor measurements. In this work, we focus on tasks where a ground robot is equipped with an omnidirectional 3D LiDAR to obtain long-range, low-noise, and dense point cloud measurements. Recent advancements in LiDAR odometry have enabled accurate and robust localization and mapping in large-scale environments [1]-[3], allowing recent planners to focus on exploring the environment without concerns about mapping/localization accuracy [4]- [9]. Despite this, few planners support exploration at large scale in real-world environments [5], [10], mainly due to the complexity that comes with long-term, real-time path planning requirements. That is, to achieve efficient exploration, the planner must actively react to belief and map updates at a high frequency by (re-)reasoning about the full partial belief, to replan a long-term, non-myopic exploration path. Authors are with the Department of Mechanical Engineering, College of Design and Engineering, National University of Singapore. Example hierarchical graph constructed by HEADER during its autonomous exploration of our campus.