Visual Question Answering: A Survey on Techniques and Common Trends in Recent Literature

de Faria, Ana Cláudia Akemi Matsuki, Bastos, Felype de Castro, da Silva, José Victor Nogueira Alves, Fabris, Vitor Lopes, Uchoa, Valeska de Sousa, Neto, Décio Gonçalves de Aguiar, Santos, Claudio Filipi Goncalves dos

arXiv.org Artificial Intelligence 

Visual Question Answering (VQA) is a multi-disciplinary artificial intelligence research problem that has attracted the attention of researchers from computer vision, natural language processing, knowledge representation, and other machine learning communities. To solve that question, VQA is a task of generating natural language answers when a question in natural language is asked related to an image. In recent years, visual question answering as a result of the flourish in this field, datasets, metrics, and models have been proposed, and the scope of research has been expanded. Although artificial intelligence has solved several different problems, such as image classification and natural language processing (NLP), it is hard to model a problem which needs different types of data. For instance, mixing computer vision with NLP to retrieve some information about an image from a question has tricked researchers for several years.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found