visual-inertial odometry
Dual-Agent Reinforcement Learning for Adaptive and Cost-Aware Visual-Inertial Odometry
Pan, Feiyang, Zheng, Shenghe, Yin, Chunyan, Dou, Guangbin
Visual-Inertial Odometry (VIO) is a critical component for robust ego-motion estimation, enabling foundational capabilities such as autonomous navigation in robotics and real-time 6-DoF tracking for augmented reality. Existing methods face a well-known trade-off: filter-based approaches are efficient but prone to drift, while optimization-based methods, though accurate, rely on computationally prohibitive Visual-Inertial Bundle Adjustment (VIBA) that is difficult to run on resource-constrained platforms. Rather than removing VIBA altogether, we aim to reduce how often and how heavily it must be invoked. To this end, we cast two key design choices in modern VIO, when to run the visual frontend and how strongly to trust its output, as sequential decision problems, and solve them with lightweight reinforcement learning (RL) agents. Our framework introduces a lightweight, dual-pronged RL policy that serves as our core contribution: (1) a Select Agent intelligently gates the entire VO pipeline based only on high-frequency IMU data; and (2) a composite Fusion Agent that first estimates a robust velocity state via a supervised network, before an RL policy adaptively fuses the full (p, v, q) state. Experiments on the EuRoC MAV and TUM-VI datasets show that, in our unified evaluation, the proposed method achieves a more favorable accuracy-efficiency-memory trade-off than prior GPU-based VO/VIO systems: it attains the best average ATE while running up to 1.77 times faster and using less GPU memory. Compared to classical optimization-based VIO systems, our approach maintains competitive trajectory accuracy while substantially reducing computational load.
SP-VINS: A Hybrid Stereo Visual Inertial Navigation System based on Implicit Environmental Map
Du, Xueyu, Zhang, Lilian, Duan, Fuan, Luo, Xincan, Wang, Maosong, Wu, Wenqi, JunMao, null
Abstract-- Filter-based visual inertial navigation system (VINS) has attracted mobile-robot researchers for the good balance between accuracy and efficiency, but its limited mapping quality hampers long-term high-accuracy state estimation. T o this end, we first propose a novel filter-based stereo VINS, differing from traditional simultaneous localization and mapping (SLAM) systems based on 3D map, which performs efficient loop closure constraints with implicit environmental map composed of keyframes and 2D keypoints. Secondly, we proposed a hybrid residual filter framework that combines landmark reprojection and ray constraints to construct a unified Ja-cobian matrix for measurement updates. Finally, considering the degraded environment, we incorporated the camera-IMU extrinsic parameters into visual description to achieve online calibration. Benchmark experiments demonstrate that the proposed SP-VINS achieves high computational efficiency while maintaining long-term high-accuracy localization performance, and is superior to existing state-of-the-art (SOT A) methods.
TCB-VIO: Tightly-Coupled Focal-Plane Binary-Enhanced Visual Inertial Odometry
Lisondra, Matthew, Kim, Junseo, Shimoda, Glenn Takashi, Zareinia, Kourosh, Saeedi, Sajad
Vision algorithms can be executed directly on the image sensor when implemented on the next-generation sensors known as focal-plane sensor-processor arrays (FPSP)s, where every pixel has a processor. FPSPs greatly improve latency, reducing the problems associated with the bottleneck of data transfer from a vision sensor to a processor. FPSPs accelerate vision-based algorithms such as visual-inertial odometry (VIO). However, VIO frameworks suffer from spatial drift due to the vision-based pose estimation, whilst temporal drift arises from the inertial measurements. FPSPs circumvent the spatial drift by operating at a high frame rate to match the high-frequency output of the inertial measurements. In this paper, we present TCB-VIO, a tightly-coupled 6 degrees-of-freedom VIO by a Multi-State Constraint Kalman Filter (MSCKF), operating at a high frame-rate of 250 FPS and from IMU measurements obtained at 400 Hz. TCB-VIO outperforms state-of-the-art methods: ROVIO, VINS-Mono, and ORB-SLAM3.
Statistical Uncertainty Learning for Robust Visual-Inertial State Estimation
Choi, Seungwon, Park, Donggyu, Hwang, Seo-Yeon, Kim, Tae-Wan
A fundamental challenge in robust visual-inertial odometry (VIO) is to dynamically assess the reliability of sensor measurements. This assessment is crucial for properly weighting the contribution of each measurement to the state estimate. Conventional methods often simplify this by assuming a static, uniform uncertainty for all measurements. This heuristic, however, may be limited in its ability to capture the dynamic error characteristics inherent in real-world data. To improve this limitation, we present a statistical framework that learns measurement reliability assessment online, directly from sensor data and optimization results. Our approach leverages multi-view geometric consistency as a form of self-supervision. This enables the system to infer landmark uncertainty and adaptively weight visual measurements during optimization. We evaluated our method on the public EuRoC dataset, demonstrating improvements in tracking accuracy with average reductions of approximately 24\% in translation error and 42\% in rotation error compared to baseline methods with fixed uncertainty parameters. The resulting framework operates in real time while showing enhanced accuracy and robustness. To facilitate reproducibility and encourage further research, the source code will be made publicly available.
Benchmarking Egocentric Visual-Inertial SLAM at City Scale
Krishnan, Anusha, Liu, Shaohui, Sarlin, Paul-Edouard, Gentilhomme, Oscar, Caruso, David, Monge, Maurizio, Newcombe, Richard, Engel, Jakob, Pollefeys, Marc
Precise 6-DoF simultaneous localization and mapping (SLAM) from onboard sensors is critical for wearable devices capturing egocentric data, which exhibits specific challenges, such as a wider diversity of motions and viewpoints, prevalent dynamic visual content, or long sessions affected by time-varying sensor calibration. While recent progress on SLAM has been swift, academic research is still driven by benchmarks that do not reflect these challenges or do not offer sufficiently accurate ground truth poses. In this paper, we introduce a new dataset and benchmark for visual-inertial SLAM with egocentric, multi-modal data. We record hours and kilometers of trajectories through a city center with glasses-like devices equipped with various sensors. We leverage surveying tools to obtain control points as indirect pose annotations that are metric, centimeter-accurate, and available at city scale. This makes it possible to evaluate extreme trajectories that involve walking at night or traveling in a vehicle. We show that state-of-the-art systems developed by academia are not robust to these challenges and we identify components that are responsible for this. In addition, we design tracks with different levels of difficulty to ease in-depth analysis and evaluation of less mature approaches. The dataset and benchmark are available at https://www.lamaria.ethz.ch.
Observer Design for Optical Flow-Based Visual-Inertial Odometry with Almost-Global Convergence
Bouazza, Tarek, Berkane, Soulaimane, Hua, Minh-Duc, Hamel, Tarek
This paper presents a novel cascaded observer architecture that combines optical flow and IMU measurements to perform continuous monocular visual-inertial odometry (VIO). The proposed solution estimates body-frame velocity and gravity direction simultaneously by fusing velocity direction information from optical flow measurements with gyro and accelerometer data. This fusion is achieved using a globally exponentially stable Riccati observer, which operates under persistently exciting translational motion conditions. The estimated gravity direction in the body frame is then employed, along with an optional magnetometer measurement, to design a complementary observer on $\mathbf{SO}(3)$ for attitude estimation. The resulting interconnected observer architecture is shown to be almost globally asymptotically stable. To extract the velocity direction from sparse optical flow data, a gradient descent algorithm is developed to solve a constrained minimization problem on the unit sphere. The effectiveness of the proposed algorithms is validated through simulation results.
Event-based Stereo Visual-Inertial Odometry with Voxel Map
Zhang, Zhaoxing, Wang, Xiaoxiang, Zhang, Chengliang, Guo, Yangyang, Yuan, Zikang, Yang, Xin
The event camera, renowned for its high dynamic range and exceptional temporal resolution, is recognized as an important sensor for visual odometry. However, the inherent noise in event streams complicates the selection of high-quality map points, which critically determine the precision of state estimation. To address this challenge, we propose Voxel-ESVIO, an event-based stereo visual-inertial odometry system that utilizes voxel map management, which efficiently filter out high-quality 3D points. Specifically, our methodology utilizes voxel-based point selection and voxel-aware point management to collectively optimize the selection and updating of map points on a per-voxel basis. These synergistic strategies enable the efficient retrieval of noise-resilient map points with the highest observation likelihood in current frames, thereby ensureing the state estimation accuracy. Extensive evaluations on three public benchmarks demonstrate that our Voxel-ESVIO outperforms state-of-the-art methods in both accuracy and computational efficiency.
A Novel ViDAR Device With Visual Inertial Encoder Odometry and Reinforcement Learning-Based Active SLAM Method
Xin, Zhanhua, Wang, Zhihao, Zhang, Shenghao, Chi, Wanchao, Meng, Yan, Kong, Shihan, Xiong, Yan, Zhang, Chong, Liu, Yuzhen, Yu, Junzhi
Abstract--In the field of multi-sensor fusion for simultaneous localization and mapping (SLAM), monocular cameras and IMU s are widely used to build simple and effective visual-inerti al systems. However, limited research has explored the integr ation of motor-encoder devices to enhance SLAM performance. By incorporating such devices, it is possible to significantly improve active capability and field of view (FOV) with minimal additi onal cost and structural complexity. This paper proposes a novel visual-inertial-encoder tightly coupled odometry (VIEO) based on a ViDAR (Video Detection and Ranging) device. A ViDAR calibration method is introduced to ensure accurate initia lization for VIEO. In addition, a platform motion decoupled active SLAM method based on deep reinforcement learning (DRL) is proposed. Experimental data demonstrate that the proposed Vi-DAR and the VIEO algorithm significantly increase cross-fra me co-visibility relationships compared to its correspondin g visual-inertial odometry (VIO) algorithm, improving state estima tion accuracy. The proposed methodolog y sheds fresh insights into both the updated platform design and decoupled approach of active SLAM systems in complex environments. N recent years, visual odometry (VO) and visual-inertial odometry (VIO) have made significant advancements. This work was supported in part by the Beijing Natural Scienc e Foundation under Grant 2022MQ05, in part by the CIE-Tencent Robotics X R hino-Bird Focused Research Program under Grant 2022-07, and in part by the National Natural Science Foundation of China under Grant 62203015, G rant 62303020, Grant 62303021, and Grant 62273351. Zhanhua Xin, Zhihao Wang, Shihan Kong, Y an Xiong, and Junzhi Y u are with the State Key Laboratory for Turbulence and Complex Systems, Department of Advanced Manufacturing and Robotics, C ollege of Engineering, Peking University, Beijing 100871, China (email: xinzhan-hua@stu.pku.edu.cn;
Structureless VIO
Song, Junlin, Olivares-Mendez, Miguel
Visual odometry (VO) is typically considered as a chicken-and-egg problem, as the localization and mapping modules are tightly-coupled. The estimation of a visual map relies on accurate localization information. Meanwhile, localization requires precise map points to provide motion constraints. This classical design principle is naturally inherited by visual-inertial odometry (VIO). Efficient localization solutions that do not require a map have not been fully investigated. To this end, we propose a novel structureless VIO, where the visual map is removed from the odometry framework. Experimental results demonstrated that, compared to the structure-based VIO baseline, our structureless VIO not only substantially improves computational efficiency but also has advantages in accuracy.
Edge-Enabled VIO with Long-Tracked Features for High-Accuracy Low-Altitude IoT Navigation
Huang, Xiaohong, Yang, Cui, Wen, Miaowen
This paper presents a visual-inertial odometry (VIO) method using long-tracked features. Long-tracked features can constrain more visual frames, reducing localization drift. However, they may also lead to accumulated matching errors and drift in feature tracking. Current VIO methods adjust observation weights based on re-projection errors, yet this approach has flaws. Re-projection errors depend on estimated camera poses and map points, so increased errors might come from estimation inaccuracies, not actual feature tracking errors. This can mislead the optimization process and make long-tracked features ineffective for suppressing localization drift. Furthermore, long-tracked features constrain a larger number of frames, which poses a significant challenge to real-time performance of the system. To tackle these issues, we propose an active decoupling mechanism for accumulated errors in long-tracked feature utilization. We introduce a visual reference frame reset strategy to eliminate accumulated tracking errors and a depth prediction strategy to leverage the long-term constraint. To ensure real time preformane, we implement three strategies for efficient system state estimation: a parallel elimination strategy based on predefined elimination order, an inverse-depth elimination simplification strategy, and an elimination skipping strategy. Experiments on various datasets show that our method offers higher positioning accuracy with relatively short consumption time, making it more suitable for edge-enabled low-altitude IoT navigation, where high-accuracy positioning and real-time operation on edge device are required. The code will be published at github.