Efficient End-to-end Visual Localization for Autonomous Driving with Decoupled BEV Neural Matching

Miao, Jinyu, Wen, Tuopu, Luo, Ziang, Qian, Kangan, Fu, Zheng, Wang, Yunlong, Jiang, Kun, Yang, Mengmeng, Huang, Jin, Zhong, Zhihua, Yang, Diange

arXiv.org Artificial Intelligence 

-- Accurate localization plays an important role in high-level autonomous driving systems. Conventional map matching-based localization methods solve the poses by explicitly matching map elements with sensor observations, generally sensitive to perception noise, therefore requiring costly hyper-parameter tuning. In this paper, we propose an end-to-end localization neural network which directly estimates vehicle poses from surrounding images, without explicitly matching perception results with HD maps. T o ensure efficiency and inter-pretability, a decoupled BEV neural matching-based pose solver is proposed, which estimates poses in a differentiable sampling-based matching module. Moreover, the sampling space is hugely reduced by decoupling the feature representation affected by each DoF of poses. The experimental results demonstrate that the proposed network is capable of performing decimeter level localization with mean absolute errors of 0.19m, 0.13m and 0.39 Visual localization serves as a vital component in high-level Autonomous Driving (AD) systems due to its ability to estimate vehicle poses with an economical sensor suite. In recent decades, several works have achieved extraordinary success in terms of localization accuracy and robustness [1]. A plethora of scene maps has been developed in the domain of visual localization research, yielding varying degrees of pose estimation accuracy [1]. In conventional robotic systems, visual localization systems often employ geo-tagged frames [2], [3] and visual landmark maps [4].