OpenVox: Real-time Instance-level Open-vocabulary Probabilistic Voxel Representation
Deng, Yinan, Yao, Bicheng, Tang, Yihang, Yang, Yi, Yue, Yufeng
–arXiv.org Artificial Intelligence
-- In recent years, vision-language models (VLMs) have advanced open-vocabulary mapping, enabling mobile robots to simultaneously achieve environmental reconstruction and high-level semantic understanding. While integrated object cognition helps mitigate semantic ambiguity in point-wise feature maps, efficiently obtaining rich semantic understanding and robust incremental reconstruction at the instance-level remains challenging. T o address these challenges, we introduce OpenV ox, a real-time incremental open-vocabulary probabilistic instance voxel representation. In the front-end, we design an efficient instance segmentation and comprehension pipeline that enhances language reasoning through encoding captions. In the back-end, we implement probabilistic instance voxels and formulate the cross-frame incremental fusion process into two subtasks: instance association and live map evolution, ensuring robustness to sensor and segmentation noise. Extensive evaluations across multiple datasets demonstrate that OpenV ox achieves state-of-the-art performance in zero-shot instance segmentation, semantic segmentation, and open-vocabulary retrieval. The project page of OpenV ox is available at https://open-vox.github.io/ . I. INTRODUCTION Accurate 3D scene reconstruction and understanding are essential for robotic downstream tasks.
arXiv.org Artificial Intelligence
Feb-23-2025
- Genre:
- Research Report (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks
- Deep Learning (0.46)
- Natural Language > Text Processing (0.68)
- Representation & Reasoning > Uncertainty (0.94)
- Robots (1.00)
- Vision (1.00)
- Machine Learning > Neural Networks
- Information Technology > Artificial Intelligence