See&Trek: Training-Free Spatial Prompting for Multimodal Large Language Model
–Neural Information Processing Systems
We introduce See&Trek, the first training-free prompting framework tailored to enhance the spatial understanding of Multimodal Large Language Models (MLLMs) under vision-only constraints. While prior efforts have incorporated modalities like depth or point clouds to improve spatial reasoning, purely visual-spatial understanding remains underexplored.
Neural Information Processing Systems
Jun-12-2026, 07:51:10 GMT
- Technology: