VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction

Open in new window