Generating Rationales in Visual Question Answering

Ayyubi, Hammad A., Tanjim, Md. Mehrab, McAuley, Julian J., Cottrell, Garrison W.

Apr-4-2020–arXiv.org Artificial Intelligence

Despite recent advances in Visual QuestionAnswering (VQA), it remains a challenge todetermine how much success can be attributedto sound reasoning and comprehension ability.We seek to investigate this question by propos-ing a new task ofrationale generation. Es-sentially, we task a VQA model with generat-ing rationales for the answers it predicts. Weuse data from the Visual Commonsense Rea-soning (VCR) task, as it contains ground-truthrationales along with visual questions and an-swers. We first investigate commonsense un-derstanding in one of the leading VCR mod-els, ViLBERT, by generating rationales frompretrained weights using a state-of-the-art lan-guage model, GPT-2. Next, we seek to jointlytrain ViLBERT with GPT-2 in an end-to-endfashion with the dual task of predicting the an-swer in VQA and generating rationales. Weshow that this kind of training injects com-monsense understanding in the VQA modelthrough quantitative and qualitative evaluationmetrics

computer vision, devi parikh, rationale, (13 more...)

arXiv.org Artificial Intelligence

Apr-4-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Pennsylvania > Philadelphia County
    - Philadelphia (0.04)
  - California > San Diego County
    - San Diego (0.04)
- Europe > Spain
  - Catalonia > Barcelona Province > Barcelona (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Chatbot (0.56)
    - Question Answering (0.55)
    - Large Language Model (0.46)
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found