BEV-ODOM2: Enhanced BEV-based Monocular Visual Odometry with PV-BEV Fusion and Dense Flow Supervision for Ground Robots
Wei, Yufei, Lu, Wangtao, Lu, Sha, Hu, Chenxiao, Han, Fuzhang, Xiong, Rong, Wang, Yue
–arXiv.org Artificial Intelligence
Abstract--Bird's-Eye-View (BEV) representation offers a metric-scaled planar workspace, facilitating the simplification of 6-DoF ego-motion to a more robust 3-DoF model for monocular visual odometry (MVO) in intelligent transportation systems. However, existing BEV methods suffer from sparse supervision signals and information loss during perspective-to-BEV projection. Our approach introduces: (1) dense BEV optical flow supervision constructed from 3-DoF pose ground truth for pixel-level guidance; (2) PV-BEV fusion that computes correlation volumes before projection to preserve 6-DoF motion cues while maintaining scale consistency. The framework employs three supervision levels derived solely from pose data: dense BEV flow, 5-DoF for the PV branch, and final 3-DoF output. Extensive evaluation on KITTI, NCL T, Oxford, and our newly collected ZJH-VO multi-scale dataset demonstrates state-of-the-art performance, achieving 40% improvement in RTE compared to previous BEV methods. The ZJH-VO dataset, covering diverse ground vehicle scenarios from underground parking to outdoor plazas, is publicly available to facilitate future research. IRD'S-EYE-VIEW (BEV) representation has become a cornerstone for perception and localization tasks in modern intelligent transportation systems [1]-[3], offering a powerful solution to the scale drift problem inherent in Monoc-ular Visual Odometry (MVO) [4], [5]. For ground vehicles like autonomous cars and logistics robots, motion is predominantly planar [6]. This allows for simplifying pose estimation from six degrees of freedom (6-DoF) to a more robust 3-DoF model (x,y, yaw), which naturally aligns with the unified, metric-scaled grid of BEV representation [7]. This simplification not only reduces computational complexity but also mitigates the accumulation of errors in non-primary motion axes, a common source of drift in long-range navigation.
arXiv.org Artificial Intelligence
Sep-19-2025
- Country:
- Asia > China (0.04)
- Europe > Germany
- Bavaria > Upper Bavaria > Munich (0.04)
- North America > United States
- Michigan (0.04)
- Genre:
- Research Report (0.64)
- Industry:
- Transportation
- Ground > Road (0.66)
- Infrastructure & Services (0.68)
- Transportation
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks
- Deep Learning (0.46)
- Robots (1.00)
- Vision (1.00)
- Machine Learning > Neural Networks
- Information Technology > Artificial Intelligence