Goto

Collaborating Authors

 Education


Open Vocabulary 3D Occupancy Prediction from Images

Neural Information Processing Systems

We describe an approach to predict open-vocabulary 3D semantic voxel occupancy map from input 2D images with the objective of enabling 3D grounding, segmentation and retrieval of free-form language queries.



HT-Step: Aligning Instructional Articles with How-To Videos

Neural Information Processing Systems

Our dataset significantly surpasses existing labeled step datasets in terms of scale, number of tasks, and richness of natural language step descriptions.