Bias-Eliminated PnP for Stereo Visual Odometry: Provably Consistent and Large-Scale Localization

Zeng, Guangyang, Shen, Yuan, Hong, Ziyang, Hong, Yuze, Ila, Viorela, Shi, Guodong, Wu, Junfeng

arXiv.org Artificial Intelligence 

--In this paper, we first present a bias-eliminated weighted (Bias-Eli-W) perspective-n-point (PnP) estimator for stereo visual odometry (VO) with provable consistency. Specifically, leveraging statistical theory, we develop an asymptotically unbiased and n-consistent PnP estimator that accounts for varying 3D triangulation uncertainties, ensuring that the relative pose estimate converges to the ground truth as the number of features increases. Next, on the stereo VO pipeline side, we propose a framework that continuously triangulates contemporary features for tracking new frames, effectively decoupling temporal dependencies between pose and 3D point errors. We integrate the Bias-Eli-W PnP estimator into the proposed stereo VO pipeline, creating a synergistic effect that enhances the suppression of pose estimation errors. Experimental results demonstrate that our method: 1) achieves significant improvements in both relative pose error and absolute trajectory error in large-scale environments; 2) provides reliable localization under erratic and unpredictable robot motions. The successful implementation of the Bias-Eli-W PnP in stereo VO indicates the importance of information screening in robotic estimation tasks with high-uncertainty measurements, shedding light on diverse applications where PnP is a key ingredient. Index T erms --Stereo visual odometry, PnP pose estimation, large-scale localization, consistent estimator . ISUAL odometry (VO) refers to estimating the pose of a moving camera in a 3D space from sequential images captured by the camera. The significance of VO stems from its advantages of being infrastructure-free, cost-effective, lightweight, energy-efficient, etc [1, 2, 3]. It enables robots to perceive and navigate their environment autonomously. Compared with monocular VO, stereo VO offers several advantages, such as scale consistency, better accuracy, and enhanced robustness, due to its ability to perceive depth directly [4, 5]. Existing VO methods typically optimize both camera poses and 3D map points simultaneously, with the map being used to track new frames through the perspective-n-point (PnP) algorithm [1, 6, 4].