POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images David Hurych
–Neural Information Processing Systems
We describe an approach to predict open-vocabulary 3D semantic voxel occupancy map from input 2D images with the objective of enabling 3D grounding, segmentation and retrieval of free-form language queries. This is a challenging problem because of the 2D-3D ambiguity and the open-vocabulary nature of the target tasks, where obtaining annotated training data in 3D is difficult. The contributions of this work are three-fold.
Neural Information Processing Systems
Feb-11-2025, 06:38:12 GMT