Goto

Collaborating Authors

 priorwork


1fd6c4e41e2c6a6b092eb13ee72bce95-AuthorFeedback.pdf

Neural Information Processing Systems

GVQA (from VQA-CP) builds on stacked attention networks (SAN).13 However,SAN and,byextension, GVQA architectures donotevaluate for,andgeneralize poorly on,17 unseen object attributes (CLEVR-CoGenT) and linguistic structural pattern (CLOSURE) combinations. The language parser is not trained, constructs text (s) object graphs (Gs) using rules-based entity38 recognizer [L126].(W3)CLOSURE ClarityMinor clarifications -5a: corrected inthecamera-ready version.