AITopics | gvqa

Neural Information Processing Systems http://nips.cc/

question-only adversary, question-only model, vqa model, (14 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.72)

Add feedback

Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

Sainandan Ramakrishnan, Aishwarya Agrawal, Stefan Lee

Neural Information Processing SystemsNov-20-2025, 17:03:01 GMT

Modern Visual Question Answering (VQA) models have been shown to rely heavily on superficial correlations between question and answer words learned during training - e.g .

machine learning, natural language, question answering, (18 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.72)

Add feedback

Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

Ramakrishnan, Sainandan, Agrawal, Aishwarya, Lee, Stefan

Neural Information Processing SystemsDec-31-2018

Modern Visual Question Answering (VQA) models have been shown to rely heavily on superficial correlations between question and answer words learned during training -- \eg overwhelmingly reporting the type of room as kitchen or the sport being played as tennis, irrespective of the image. Most alarmingly, this shortcoming is often not well reflected during evaluation because the same strong priors exist in test distributions; however, a VQA system that fails to ground questions in image content would likely perform poorly in real-world settings. In this work, we present a novel regularization scheme for VQA that reduces this effect. We introduce a question-only model that takes as input the question encoding from the VQA model and must leverage language biases in order to succeed. We then pose training as an adversarial game between the VQA model and this question-only adversary -- discouraging the VQA model from capturing language biases in its question encoding.Further, we leverage this question-only model to estimate the mutual information between the image and answer given the question, which we maximize explicitly to encourage visual grounding. Our approach is a model agnostic training procedure and simple to implement. We show empirically that it can improve performance significantly on a bias-sensitive split of the VQA dataset for multiple base models -- achieving state-of-the-art on this task. Further, on standard VQA tasks, our approach shows significantly less drop in accuracy compared to existing bias-reducing VQA models.

machine learning, natural language, question answering, (18 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Sports (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.72)

Add feedback

Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

Ramakrishnan, Sainandan, Agrawal, Aishwarya, Lee, Stefan

Neural Information Processing SystemsDec-31-2018

Modern Visual Question Answering (VQA) models have been shown to rely heavily on superficial correlations between question and answer words learned during training -- \eg overwhelmingly reporting the type of room as kitchen or the sport being played as tennis, irrespective of the image. Most alarmingly, this shortcoming is often not well reflected during evaluation because the same strong priors exist in test distributions; however, a VQA system that fails to ground questions in image content would likely perform poorly in real-world settings. In this work, we present a novel regularization scheme for VQA that reduces this effect. We introduce a question-only model that takes as input the question encoding from the VQA model and must leverage language biases in order to succeed. We then pose training as an adversarial game between the VQA model and this question-only adversary -- discouraging the VQA model from capturing language biases in its question encoding.Further, we leverage this question-only model to estimate the mutual information between the image and answer given the question, which we maximize explicitly to encourage visual grounding. Our approach is a model agnostic training procedure and simple to implement. We show empirically that it can improve performance significantly on a bias-sensitive split of the VQA dataset for multiple base models -- achieving state-of-the-art on this task. Further, on standard VQA tasks, our approach shows significantly less drop in accuracy compared to existing bias-reducing VQA models.

machine learning, natural language, question answering, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Sports (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.72)

Add feedback

Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering

Agrawal, Aishwarya, Batra, Dhruv, Parikh, Devi, Kembhavi, Aniruddha

arXiv.org Artificial IntelligenceJun-3-2018

A number of studies have found that today's Visual Question Answering (VQA) models are heavily driven by superficial correlations in the training data and lack sufficient image grounding. To encourage development of models geared towards the latter, we propose a new setting for VQA where for every question type, train and test sets have different prior distributions of answers. Specifically, we present new splits of the VQA v1 and VQA v2 datasets, which we call Visual Question Answering under Changing Priors (VQA-CP v1 and VQA-CP v2 respectively). First, we evaluate several existing VQA models under this new setting and show that their performance degrades significantly compared to the original VQA setting. Second, we propose a novel Grounded Visual Question Answering model (GVQA) that contains inductive biases and restrictions in the architecture specifically designed to prevent the model from 'cheating' by primarily relying on priors in the training data. Specifically, GVQA explicitly disentangles the recognition of visual concepts present in the image from the identification of plausible answer space for a given question, enabling the model to more robustly generalize across different distributions of answers. GVQA is built off an existing VQA model -- Stacked Attention Networks (SAN). Our experiments demonstrate that GVQA significantly outperforms SAN on both VQA-CP v1 and VQA-CP v2 datasets. Interestingly, it also outperforms more powerful VQA models such as Multimodal Compact Bilinear Pooling (MCB) in several cases. GVQA offers strengths complementary to SAN when trained and evaluated on the original VQA v1 and VQA v2 datasets. Finally, GVQA is more transparent and interpretable than existing VQA models.

gvqa, machine learning, question answering, (20 more...)

arXiv.org Artificial Intelligence

1712.00377

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Sports (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Filters

Collaborating Authors

gvqa

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering