Language Conditioned Spatial Relation Reasoning for 3D Object Grounding
–Neural Information Processing Systems
Localizing objects in 3D scenes based on natural language requires understanding and reasoning about spatial relations. In particular, it is often crucial to distinguish similar objects referred by the text, such as "the left most chair" and "a chair next to the window". In this work we propose a language-conditioned transformer model for grounding 3D objects and their spatial relations.
Neural Information Processing Systems
Mar-26-2025, 22:33:01 GMT