Goto

Collaborating Authors

 Asia


ImOV3D: LearningOpen-VocabularyPointClouds 3DObjectDetectionfromOnly2DImages

Neural Information Processing Systems

Open-vocabulary 3D object detection (OV-3Det) aims to generalize beyond the limited number ofbasecategories labeled during thetraining phase. Thebiggest bottleneck is the scarcity of annotated 3D data, whereas 2D image datasets are abundantandrichlyannotated.









Multi-modalSituated Reasoningin3DScenes

Neural Information Processing Systems

Comprehensiveevaluationson MSQA andMSNN highlight thelimitations ofexisting vision-language models and underscore the importance of handling multi-modal interleaved inputs and situation modeling.