Multi-Visual-Inertial System: Analysis, Calibration and Estimation

Yang, Yulin, Geneva, Patrick, Huang, Guoquan

arXiv.org Artificial Intelligence 

Regarding state estimation, many works have explored The combination of cameras and inertial measurement units to use multiple vision sensors for better VINS performance (IMUs) have become prevalent in autonomous vehicles and (Leutenegger et al. 2015; Usenko et al. 2016; Paul mobile devices in the recent decade due to their decrease in et al. 2017; Sun et al. 2018; Kuo et al. 2020; Campos cost and complementary sensing nature. A camera provides et al. 2021; Fu et al. 2021). In particular, Leutenegger texture-rich images of 2 degree-of-freedom (DoF) bearing et al. (2015), Usenko et al. (2016) and Fu et al. (2021) observations to environmental features, while a 6-axis IMU have shown that stereo camera or multiple cameras can typically consists of a gyroscope and an accelerometer achieve better pose accuracy or lower the uncertainties which measures high-frequency angular velocity and linear of IMU-Camera calibration. Only a few works recently acceleration, respectively. This has lead to a significant investigate multiple inertial sensor fusion for VINS (Kim progress of developing visual-inertial navigation system et al. 2017; Eckenhoff et al. 2019b; Zhang et al. 2020; (VINS) algorithms focusing on efficient and accurate pose Wu et al. 2023; Faizullin and Ferrer 2023), showing that estimation (Huang 2019). While many works have shown the system robustness and pose accuracy can be improved accurate estimation for the minimal sensing case of a single by fusing additional IMUs. For optimal fusion of multiple camera and IMU (Mourikis and Roumeliotis 2007; Bloesch asynchronous visual and inertial sensors for MVIS, et al. 2015; Forster et al. 2016; Qin et al. 2018; Geneva et al. it is crucial to provide accurate full-parameter calibration 2020), it is known that the inclusion of additional sensors for these sensors, which include: (i) IMU-IMU/camera can provide improved accuracy due to additional information rigid transformation, (ii) IMU-IMU/camera time offset, (iii) and robustness to single sensor failure cases (Paul et al.