Zero-Shot Metric Depth Estimation via Monocular Visual-Inertial Rescaling for Autonomous Aerial Navigation
Yang, Steven, Tian, Xiaoyu, Goel, Kshitij, Tabib, Wennie
–arXiv.org Artificial Intelligence
-- This paper presents a methodology to predict metric depth from monocular RGB images and an inertial measurement unit (IMU). T o enable collision avoidance during autonomous flight, prior works either leverage heavy sensors (e.g., LiDARs or stereo cameras) or data-intensive and domain-specific fine-tuning of monocular metric depth estimation methods. In contrast, we propose several lightweight zero-shot rescaling strategies to obtain metric depth from relative depth estimates via the sparse 3D feature map created using a visual-inertial navigation system. These strategies are compared for their accuracy in diverse simulation environments. The best performing approach, which leverages monotonic spline fitting, is deployed in the real-world on a compute-constrained quadrotor . We obtain on-board metric depth estimates at 15 Hz and demonstrate successful collision avoidance after integrating the proposed method with a motion primitives-based planner . I. INTRODUCTION First Person View (FPV) drone pilots leverage a single forward-facing camera video stream transmitted over a radio feed and sensors embedded in the flight controller (e.g., IMU) to aggressively maneuver through dense clutter (e.g., through tree branches, under bridges, etc.).
arXiv.org Artificial Intelligence
Sep-11-2025
- Genre:
- Research Report (0.82)
- Industry:
- Transportation > Air (0.86)
- Information Technology > Robotics & Automation (0.54)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (1.00)
- Vision > Image Understanding (0.73)
- Robots > Autonomous Vehicles
- Drones (0.35)
- Information Technology > Artificial Intelligence