Latent Variable Models for Visual Question Answering

Jan-16-2021–arXiv.org Artificial Intelligence

Conventional models for Visual Question Answering (VQA) explore deterministic approaches with various types of image features, question features, and attention mechanisms. However, there exist other modalities that can be explored in addition to image and question pairs to bring extra information to the models. In this work, we propose latent variable models for VQA where extra information (e.g. captions and answer categories) are incorporated as latent variables to improve inference, which in turn benefits question-answering performance. Experiments on the VQA v2.0 benchmarking dataset demonstrate the effectiveness of our proposed models in that they improve over strong baselines, especially those that do not rely on extensive language-vision pre-training.

answer category, caption, category, (16 more...)

arXiv.org Artificial Intelligence

Jan-16-2021

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Italy
  - Tuscany > Florence (0.04)
- Asia > China
  - Hong Kong (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Question Answering (0.92)
  - Machine Learning
    - Learning Graphical Models (0.87)
    - Neural Networks > Deep Learning (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found