Attention on Attention: Architectures for Visual Question Answering (VQA)

Singh, Jasdeep, Ying, Vincent, Nutkiewicz, Alex

Mar-20-2018–arXiv.org Artificial Intelligence

Visual Question Answering (VQA) is an increasingly popular topic in deep learning research, requiring coordination of natural language processing and computer vision modules into a single architecture. We build upon the model which placed first in the VQA Challenge by developing thirteen new attention mechanisms and introducing a simplified classifier. We performed 300 GPU hours of extensive hyperparameter and architecture searches and were able to achieve an evaluation score of 64.78%, outperforming the existing state-of-the-art single model's validation score of 63.15%.

machine learning, natural language, question answering, (16 more...)

arXiv.org Artificial Intelligence

Mar-20-2018

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.65)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Question Answering (0.65)
  - Machine Learning > Neural Networks
    - Deep Learning (0.55)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found