POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images
–Neural Information Processing Systems
We describe an approach to predict open-vocabulary 3D semantic voxel occupancy map from input 2D images with the objective of enabling 3D grounding, segmentation and retrieval of free-form language queries. This is a challenging problem because of the 2D-3D ambiguity and the open-vocabulary nature of the target tasks, where obtaining annotated training data in 3D is difficult. The contributions of this work are three-fold.
Neural Information Processing Systems
Dec-26-2025, 10:51:18 GMT
- Technology: