Fine-grained Image-to-LiDAR Contrastive Distillation with Visual Foundation Models

Neural Information Processing Systems 

To overcome this, we harness Visual Foundation Models (VFMs), which have revolutionized the acquisition of pixel-level semantics, to enhance 3D representation learning.