MOSU: Autonomous Long-range Robot Navigation with Multi-modal Scene Understanding
Liang, Jing, Weerakoon, Kasun, Song, Daeun, Kirubaharan, Senthurbavan, Xiao, Xuesu, Manocha, Dinesh
–arXiv.org Artificial Intelligence
We present MOSU, a novel autonomous long-range navigation system that enhances global navigation for mobile robots through multimodal perception and on-road scene understanding. MOSU addresses the outdoor robot navigation challenge by integrating geometric, semantic, and contextual information to ensure comprehensive scene understanding. The system combines GPS and QGIS map-based routing for high-level global path planning and multi-modal trajectory generation for local navigation refinement. For trajectory generation, MOSU leverages multi-modalities: LiDAR-based geometric data for precise obstacle avoidance, image-based semantic segmentation for traversability assessment, and Vision-Language Models (VLMs) to capture social context and enable the robot to adhere to social norms in complex environments. This multi-modal integration improves scene understanding and enhances traversability, allowing the robot to adapt to diverse outdoor conditions. We evaluate our system in real-world on-road environments and benchmark it on the GND dataset, achieving a 10% improvement in traversability on navigable terrains while maintaining a comparable navigation distance to existing global navigation methods.
arXiv.org Artificial Intelligence
Jul-8-2025
- Country:
- Genre:
- Research Report (0.40)
- Industry:
- Transportation > Ground > Road (0.48)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks (0.46)
- Representation & Reasoning > Planning & Scheduling (0.70)
- Robots (1.00)
- Vision (1.00)
- Information Technology > Artificial Intelligence