Multi-modalSituated Reasoningin3DScenes

Neural Information Processing Systems 

Comprehensiveevaluationson MSQA andMSNN highlight thelimitations ofexisting vision-language models and underscore the importance of handling multi-modal interleaved inputs and situation modeling.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found