Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding Yunze Man 1 Martial Hebert

Neural Information Processing Systems 

Complex 3D scene understanding has gained increasing attention, with scene encoding strategies built on top of visual foundation models playing a crucial role in this success. However, the optimal scene encoding strategies for various scenarios remain unclear, particularly compared to their image-based counterparts. To address this issue, we present the first comprehensive study that probes various visual encoding models for 3D scene understanding, identifying the strengths and limitations of each model across different scenarios.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found