MorphoNavi: Aerial-Ground Robot Navigation with Object Oriented Mapping in Digital Twin

Karaf, Sausar, Martynov, Mikhail, Sautenkov, Oleg, Darush, Zhanibek, Tsetserukou, Dzmitry

arXiv.org Artificial Intelligence 

-- This paper presents a novel mapping approach for a universal aerial-ground robotic system utilizing a single monocular camera. The proposed system is capable of detecting a diverse range of objects and estimating their positions without requiring fine-tuning for specific environments. The system's performance was evaluated through a simulated search-and-rescue scenario, where the MorphoGear robot successfully located a robotic dog while an operator monitored the process. This work contributes to the development of intelligent, mul-timodal robotic systems capable of operating in unstructured environments. Robotics has experienced rapid advancements in recent years, with Vision-Language Models (VLMs) emerging as a powerful tool for mission execution based on RGB images. Since VLMs require only an image and a text prompt as input, they eliminate the need for expensive and specialized sensors such as LiDARs and depth cameras. This simplicity and cost-effectiveness suggest that vision-language-based control will play a crucial role in the future of robotics, with cameras becoming the primary sensor for most robotic systems. In this paper, we introduce a novel mapping approach designed for a universal air-ground robotic system using a single monocular camera.