1fd6c4e41e2c6a6b092eb13ee72bce95-Paper.pdf
–Neural Information Processing Systems
Compositional generalization is a key challenge in grounding natural language tovisual perception. While deeplearning models haveachievedgreatsuccess in multimodal tasks likevisual question answering, recent studies haveshownthat they fail to generalize to new inputs that are simply an unseen combination of those seen inthetraining distribution [6].
Neural Information Processing Systems
Feb-7-2026, 18:16:02 GMT