Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
–Neural Information Processing Systems
Complex 3D scene understanding has gained increasing attention, with scene encoding strategies built on top of visual foundation models playing a crucial role in this success. However, the optimal scene encoding strategies for various scenarios remain unclear, particularly compared to their image-based counterparts. To address this issue, we present the first comprehensive study that probes various visual encoding models for 3D scene understanding, identifying the strengths and limitations of each model across different scenarios.
Neural Information Processing Systems
Dec-26-2025, 13:46:02 GMT
- Technology:
- Information Technology > Artificial Intelligence > Vision (0.72)