See&Trek: Training-Free Spatial Prompting for Multimodal Large Language Model
–Neural Information Processing Systems
We introduce SEE&TREK, the first training-free prompting framework tailored to enhance the spatial understanding of Multimodal Large Language Models (MLLMS) under vision-only constraints. While prior efforts have incorporated modalities like depth or point clouds to improve spatial reasoning, purely visualspatial understanding remains underexplored.
Neural Information Processing Systems
Jun-17-2026, 12:17:07 GMT
- Country:
- Asia (0.28)
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Information Technology (0.67)
- Media (0.47)
- Technology: