LiDAR Registration with Visual Foundation Models

Vödisch, Niclas, Cioffi, Giovanni, Cannici, Marco, Burgard, Wolfram, Scaramuzza, Davide

arXiv.org Artificial Intelligence 

LiDAR Registration with Visual Foundation Models Niclas V odisch 1,2, Giovanni Cioffi 2, Marco Cannici 2, Wolfram Burgard 3, and Davide Scaramuzza 2 1 University of Freiburg 2 University of Zurich 3 University of Technology Nuremberg Abstract --LiDAR registration is a fundamental task in robotic mapping and localization. A critical component of aligning two point clouds is identifying robust point correspondences using point descriptors. This step becomes particularly challenging in scenarios involving domain shifts, seasonal changes, and variations in point cloud structures. In this paper, we address these problems by proposing to use DINOv2 features, obtained from surround-view images, as point descriptors. We demonstrate that coupling these descriptors with traditional registration algorithms, such as RANSAC or ICP, facilitates robust 6DoF alignment of LiDAR scans with 3D maps, even when the map was recorded more than a year before. Although conceptually straightforward, our method substantially outperforms more complex baseline techniques. In contrast to previous learning-based point descriptors, our method does not require domain-specific retraining and is agnostic to the point cloud structure, effectively handling both sparse LiDAR scans and dense 3D maps. We show that leveraging the additional camera data enables our method to outperform the best baseline by +24.8 and +17. 3 registration recall on the NCL T and Oxford RobotCar datasets. We publicly release the registration benchmark and the code of our work on https://vfm-registration.cs.uni-freiburg.de. I NTRODUCTION Aligning two point clouds to compute their relative 3D transformation is a critical task in numerous robotic applications, including LiDAR odometry [30], loop closure registration [2], and map-based localization [19]. In this work, we specifically discuss map-based localization, which not only generalizes the other aforementioned tasks but is also critical for improving the efficiency and autonomy of mobile robots in environments where pre-existing map data is available.