Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for 3D Visual Grounding
–Neural Information Processing Systems
The main question we address is "can we consolidate the 3D visual stream by 2D clues and efficiently utilize them in both training and testing phases?". The main idea is to assist the 3D encoder by incorporating rich 2D object representations without requiring extra 2D inputs. To this end, we leverage 2D clues, synthetically generated from 3D point clouds, that empirically show their aptitude to boost the quality of the learned visual representations.
Neural Information Processing Systems
Dec-25-2025, 16:55:59 GMT
- Technology: