Generating Rationales in Visual Question Answering
Ayyubi, Hammad A., Tanjim, Md. Mehrab, McAuley, Julian J., Cottrell, Garrison W.
–arXiv.org Artificial Intelligence
Despite recent advances in Visual QuestionAnswering (VQA), it remains a challenge todetermine how much success can be attributedto sound reasoning and comprehension ability.We seek to investigate this question by propos-ing a new task ofrationale generation. Es-sentially, we task a VQA model with generat-ing rationales for the answers it predicts. Weuse data from the Visual Commonsense Rea-soning (VCR) task, as it contains ground-truthrationales along with visual questions and an-swers. We first investigate commonsense un-derstanding in one of the leading VCR mod-els, ViLBERT, by generating rationales frompretrained weights using a state-of-the-art lan-guage model, GPT-2. Next, we seek to jointlytrain ViLBERT with GPT-2 in an end-to-endfashion with the dual task of predicting the an-swer in VQA and generating rationales. Weshow that this kind of training injects com-monsense understanding in the VQA modelthrough quantitative and qualitative evaluationmetrics
arXiv.org Artificial Intelligence
Apr-4-2020
- Country:
- North America > United States
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- California > San Diego County
- San Diego (0.04)
- Pennsylvania > Philadelphia County
- Europe > Spain
- Catalonia > Barcelona Province > Barcelona (0.04)
- North America > United States
- Genre:
- Research Report (0.64)
- Technology: