OpenDlign: Open-World Point Cloud Understanding with Depth-Aligned Images

Neural Information Processing Systems 

Recent open-world 3D representation learning methods using Vision-Language Models (VLMs) to align 3D point clouds with image-text information have shown superior 3D zero-shot performance.