Training an Open-Vocabulary Monocular 3D Object Detection Model without 3D Data
–Neural Information Processing Systems
However, existing point cloud-based open-vocabulary 3D detection models are limited by their high deployment costs. In this work, we propose a novel open-vocabulary monocular 3D object detection framework, dubbed OVM3D-Det, which trains detectors using only RGB images, making it both cost-effective and scalable to publicly available data.
Neural Information Processing Systems
Nov-19-2025, 19:31:35 GMT
- Country:
- Asia
- China
- Beijing > Beijing (0.04)
- Guangxi Province > Nanning (0.04)
- Japan > Honshū
- Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- Myanmar > Tanintharyi Region
- Dawei (0.04)
- China
- North America > United States
- California > Santa Clara County > Palo Alto (0.04)
- South America > Brazil (0.04)
- Asia
- Genre:
- Research Report
- Experimental Study (0.93)
- New Finding (0.68)
- Research Report
- Industry:
- Information Technology > Services (0.34)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks (0.93)
- Natural Language (1.00)
- Representation & Reasoning (1.00)
- Vision (1.00)
- Information Technology > Artificial Intelligence