ImOV3D: Learning Open-Vocabulary Point Clouds 3D Object Detection from Only 2D Images
–Neural Information Processing Systems
Open-vocabulary 3D object detection (OV-3Det) aims to generalize beyond the limited number of base categories labeled during the training phase. The biggest bottleneck is the scarcity of annotated 3D data, whereas 2D image datasets are abundant and richly annotated. Consequently, it is intuitive to leverage the wealth of annotations in 2D images to alleviate the inherent data scarcity in OV-3Det. In this paper, we push the task setup to its limits by exploring the potential of using solely 2D images to learn OV-3Det. The major challenges for this setup is the modality gap between training images and testing point clouds, which prevents effective integration of 2D knowledge into OV-3Det.
Neural Information Processing Systems
May-25-2025, 21:47:38 GMT
- Country:
- Asia (0.14)
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Information Technology (0.93)
- Technology: