1fd6c4e41e2c6a6b092eb13ee72bce95-Paper.pdf

Neural Information Processing Systems 

Compositional generalization is a key challenge in grounding natural language tovisual perception. While deeplearning models haveachievedgreatsuccess in multimodal tasks likevisual question answering, recent studies haveshownthat they fail to generalize to new inputs that are simply an unseen combination of those seen inthetraining distribution [6].

Similar Docs  Excel Report  more

TitleSimilaritySource
None found