Multimodal Learning and Reasoning for Visual Question Answering

Ilija Ilievski, Jiashi Feng

Neural Information Processing Systems 

Typically, a VQA model is comprised of two modules for learning the question and the image representations, and a third module for fusing the representations into a single multimodal representation.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found