relocalization
When and Where Localization Fails: An Analysis of the Iterative Closest Point in Evolving Environment
Dannaoui, Abdel-Raouf, Laconte, Johann, Debain, Christophe, Pomerleau, Francois, Checchin, Paul
Robust relocalization in dynamic outdoor environments remains a key challenge for autonomous systems relying on 3D lidar. While long-term localization has been widely studied, short-term environmental changes, occurring over days or weeks, remain underexplored despite their practical significance. To address this gap, we present a highresolution, short-term multi-temporal dataset collected weekly from February to April 2025 across natural and semi-urban settings. Each session includes high-density point cloud maps, 360 deg panoramic images, and trajectory data. Projected lidar scans, derived from the point cloud maps and modeled with sensor-accurate occlusions, are used to evaluate alignment accuracy against the ground truth using two Iterative Closest Point (ICP) variants: Point-to-Point and Point-to-Plane. Results show that Point-to-Plane offers significantly more stable and accurate registration, particularly in areas with sparse features or dense vegetation. This study provides a structured dataset for evaluating short-term localization robustness, a reproducible framework for analyzing scan-to-map alignment under noise, and a comparative evaluation of ICP performance in evolving outdoor environments. Our analysis underscores how local geometry and environmental variability affect localization success, offering insights for designing more resilient robotic systems.
- Europe > France > Auvergne-Rhône-Alpes > Puy-de-Dôme > Clermont-Ferrand (0.05)
- North America > United States > Michigan (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > Experimental Study (0.48)
- Research Report > New Finding (0.34)
Language-Grounded Hierarchical Planning and Execution with Multi-Robot 3D Scene Graphs
Strader, Jared, Ray, Aaron, Arkin, Jacob, Peterson, Mason B., Chang, Yun, Hughes, Nathan, Bradley, Christopher, Jia, Yi Xuan, Nieto-Granda, Carlos, Talak, Rajat, Fan, Chuchu, Carlone, Luca, How, Jonathan P., Roy, Nicholas
In this paper, we introduce a multi-robot system that integrates mapping, localization, and task and motion planning (TAMP) enabled by 3D scene graphs to execute complex instructions expressed in natural language. Our system builds a shared 3D scene graph incorporating an open-set object-based map, which is leveraged for multi-robot 3D scene graph fusion. This representation supports real-time, view-invariant relocalization (via the object-based map) and planning (via the 3D scene graph), allowing a team of robots to reason about their surroundings and execute complex tasks. Additionally, we introduce a planning approach that translates operator intent into Planning Domain Definition Language (PDDL) goals using a Large Language Model (LLM) by leveraging context from the shared 3D scene graph and robot capabilities. We provide an experimental assessment of the performance of our system on real-world tasks in large-scale, outdoor environments. A supplementary video is available at https://youtu.be/8xbGGOLfLAY.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Maryland > Prince George's County > Adelphi (0.04)
MVL-Loc: Leveraging Vision-Language Model for Generalizable Multi-Scene Camera Relocalization
Xiao, Zhendong, Wei, Wu, Ji, Shujie, Yang, Shan, Chen, Changhao
Camera relocalization, a cornerstone capability of modern computer vision, accurately determines a camera's position and orientation (6-DoF) from images and is essential for applications in augmented reality (AR), mixed reality (MR), autonomous driving, delivery drones, and robotic navigation. Unlike traditional deep learning-based methods that regress camera pose from images in a single scene, which often lack generalization and robustness in diverse environments, we propose MVL-Loc, a novel end-to-end multi-scene 6-DoF camera relocalization framework. MVL-Loc leverages pretrained world knowledge from vision-language models (VLMs) and incorporates multimodal data to generalize across both indoor and outdoor settings. Furthermore, natural language is employed as a directive tool to guide the multi-scene learning process, facilitating semantic understanding of complex scenes and capturing spatial relationships among objects. Extensive experiments on the 7Scenes and Cambridge Landmarks datasets demonstrate MVL-Loc's robustness and state-of-the-art performance in real-world multi-scene camera relocalization, with improved accuracy in both positional and orientational estimates.
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
R3GS: Gaussian Splatting for Robust Reconstruction and Relocalization in Unconstrained Image Collections
yan, Xu, Wang, Zhaohui, Wei, Rong, Yu, Jingbo, Li, Dong, Liu, Xiangde
We propose R3GS, a robust reconstruction and relocalization framework tailored for unconstrained datasets. Our method uses a hybrid representation during training. Each anchor combines a global feature from a convolutional neural network (CNN) with a local feature encoded by the multiresolution hash grids [2]. Subsequently, several shallow multi-layer perceptrons (MLPs) predict the attributes of each Gaussians, including color, opacity, and covariance. To mitigate the adverse effects of transient objects on the reconstruction process, we ffne-tune a lightweight human detection network. Once ffne-tuned, this network generates a visibility map that efffciently generalizes to other transient objects (such as posters, banners, and cars) with minimal need for further adaptation. Additionally, to address the challenges posed by sky regions in outdoor scenes, we propose an effective sky-handling technique that incorporates a depth prior as a constraint. This allows the inffnitely distant sky to be represented on the surface of a large-radius sky sphere, signiffcantly reducing ffoaters caused by errors in sky reconstruction. Furthermore, we introduce a novel relocalization method that remains robust to changes in lighting conditions while estimating the camera pose of a given image within the reconstructed 3DGS scene. As a result, R3GS significantly enhances rendering ffdelity, improves both training and rendering efffciency, and reduces storage requirements. Our method achieves state-of-the-art performance compared to baseline methods on in-the-wild datasets. The code will be made open-source following the acceptance of the paper.
Leveraging Semantic Graphs for Efficient and Robust LiDAR SLAM
Wang, Neng, Lu, Huimin, Zheng, Zhiqiang, Wang, Hesheng, Liu, Yun-Hui, Chen, Xieyuanli
Accurate and robust simultaneous localization and mapping (SLAM) is crucial for autonomous mobile systems, typically achieved by leveraging the geometric features of the environment. Incorporating semantics provides a richer scene representation that not only enhances localization accuracy in SLAM but also enables advanced cognitive functionalities for downstream navigation and planning tasks. Existing point-wise semantic LiDAR SLAM methods often suffer from poor efficiency and generalization, making them less robust in diverse real-world scenarios. In this paper, we propose a semantic graph-enhanced SLAM framework, named SG-SLAM, which effectively leverages the geometric, semantic, and topological characteristics inherent in environmental structures. The semantic graph serves as a fundamental component that facilitates critical functionalities of SLAM, including robust relocalization during odometry failures, accurate loop closing, and semantic graph map construction. Our method employs a dual-threaded architecture, with one thread dedicated to online odometry and relocalization, while the other handles loop closure, pose graph optimization, and map update. This design enables our method to operate in real time and generate globally consistent semantic graph maps and point cloud maps. We extensively evaluate our method across the KITTI, MulRAN, and Apollo datasets, and the results demonstrate its superiority compared to state-of-the-art methods. Our method has been released at https://github.com/nubot-nudt/SG-SLAM.
- North America > United States > California > San Francisco County > San Francisco (0.04)
- Asia > South Korea > Daejeon > Daejeon (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- (2 more...)
SCORE: Saturated Consensus Relocalization in Semantic Line Maps
Jiang, Haodong, Zheng, Xiang, Zhang, Yanglin, Zeng, Qingcheng, Li, Yiqian, Hong, Ziyang, Wu, Junfeng
This is the arxiv version for our paper submitted to IEEE/RSJ IROS 2025. We propose a scene-agnostic and light-weight visual relocalization framework that leverages semantically labeled 3D lines as a compact map representation. In our framework, the robot localizes itself by capturing a single image, extracting 2D lines, associating them with semantically similar 3D lines in the map, and solving a robust perspective-n-line problem. To address the extremely high outlier ratios~(exceeding 99.5\%) caused by one-to-many ambiguities in semantic matching, we introduce the Saturated Consensus Maximization~(Sat-CM) formulation, which enables accurate pose estimation when the classic Consensus Maximization framework fails. We further propose a fast global solver to the formulated Sat-CM problems, leveraging rigorous interval analysis results to ensure both accuracy and computational efficiency. Additionally, we develop a pipeline for constructing semantic 3D line maps using posed depth images. To validate the effectiveness of our framework, which integrates our innovations in robust estimation and practical engineering insights, we conduct extensive experiments on the ScanNet++ dataset.
- Asia > China > Hong Kong (0.04)
- Asia > South Korea > Daejeon > Daejeon (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
An Efficient Scene Coordinate Encoding and Relocalization Method
Xu, Kuan, Jiang, Zeyu, Cao, Haozhi, Yuan, Shenghai, Wang, Chen, Xie, Lihua
Scene Coordinate Regression (SCR) is a visual localization technique that utilizes deep neural networks (DNN) to directly regress 2D-3D correspondences for camera pose estimation. However, current SCR methods often face challenges in handling repetitive textures and meaningless areas due to their reliance on implicit triangulation. In this paper, we propose an efficient scene coordinate encoding and relocalization method. Compared with the existing SCR methods, we design a unified architecture for both scene encoding and salient keypoint detection, enabling our system to focus on encoding informative regions, thereby significantly enhancing efficiency. Additionally, we introduce a mechanism that leverages sequential information during both map encoding and relocalization, which strengthens implicit triangulation, particularly in repetitive texture environments. Comprehensive experiments conducted across indoor and outdoor datasets demonstrate that the proposed system outperforms other state-of-the-art (SOTA) SCR methods. Our single-frame relocalization mode improves the recall rate of our baseline by 6.4% and increases the running speed from 56Hz to 90Hz. Furthermore, our sequence-based mode increases the recall rate by 11% while maintaining the original efficiency.
- Asia > Singapore (0.04)
- North America > United States > New York > Erie County > Buffalo (0.04)
Cross-Modal Visual Relocalization in Prior LiDAR Maps Utilizing Intensity Textures
Shen, Qiyuan, Zhao, Hengwang, Yan, Weihao, Wang, Chunxiang, Qin, Tong, Yang, Ming
Cross-modal localization has drawn increasing attention in recent years, while the visual relocalization in prior LiDAR maps is less studied. Related methods usually suffer from inconsistency between the 2D texture and 3D geometry, neglecting the intensity features in the LiDAR point cloud. In this paper, we propose a cross-modal visual relocalization system in prior LiDAR maps utilizing intensity textures, which consists of three main modules: map projection, coarse retrieval, and fine relocalization. In the map projection module, we construct the database of intensity channel map images leveraging the dense characteristic of panoramic projection. The coarse retrieval module retrieves the top-K most similar map images to the query image from the database, and retains the top-K' results by covisibility clustering. The fine relocalization module applies a two-stage 2D-3D association and a covisibility inlier selection method to obtain robust correspondences for 6DoF pose estimation. The experimental results on our self-collected datasets demonstrate the effectiveness in both place recognition and pose estimation tasks.
Map-Free Visual Relocalization Enhanced by Instance Knowledge and Depth Knowledge
Xiao, Mingyu, Chen, Runze, Luo, Haiyong, Zhao, Fang, Wang, Juan, Ma, Xuepeng
Map-free relocalization technology is crucial for applications in autonomous navigation and augmented reality, but relying on pre-built maps is often impractical. It faces significant challenges due to limitations in matching methods and the inherent lack of scale in monocular images. These issues lead to substantial rotational and metric errors and even localization failures in real-world scenarios. Large matching errors significantly impact the overall relocalization process, affecting both rotational and translational accuracy. Due to the inherent limitations of the camera itself, recovering the metric scale from a single image is crucial, as this significantly impacts the translation error. To address these challenges, we propose a map-free relocalization method enhanced by instance knowledge and depth knowledge. By leveraging instance-based matching information to improve global matching results, our method significantly reduces the possibility of mismatching across different objects. The robustness of instance knowledge across the scene helps the feature point matching model focus on relevant regions and enhance matching accuracy. Additionally, we use estimated metric depth from a single image to reduce metric errors and improve scale recovery accuracy. By integrating methods dedicated to mitigating large translational and rotational errors, our approach demonstrates superior performance in map-free relocalization techniques.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Shandong Province (0.04)
- Asia > China > Beijing > Beijing (0.04)
FaVoR: Features via Voxel Rendering for Camera Relocalization
Polizzi, Vincenzo, Cannici, Marco, Scaramuzza, Davide, Kelly, Jonathan
Camera relocalization methods range from dense image alignment to direct camera pose regression from a query image. Among these, sparse feature matching stands out as an efficient, versatile, and generally lightweight approach with numerous applications. However, feature-based methods often struggle with significant viewpoint and appearance changes, leading to matching failures and inaccurate pose estimates. To overcome this limitation, we propose a novel approach that leverages a globally sparse yet locally dense 3D representation of 2D features. By tracking and triangulating landmarks over a sequence of frames, we construct a sparse voxel map optimized to render image patch descriptors observed during tracking. Given an initial pose estimate, we first synthesize descriptors from the voxels using volumetric rendering and then perform feature matching to estimate the camera pose. This methodology enables the generation of descriptors for unseen views, enhancing robustness to view changes. We extensively evaluate our method on the 7-Scenes and Cambridge Landmarks datasets. Our results show that our method significantly outperforms existing state-of-the-art feature representation techniques in indoor environments, achieving up to a 39% improvement in median translation error. Additionally, our approach yields comparable results to other methods for outdoor scenarios while maintaining lower memory and computational costs.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)