Visual Question Answering: A Survey on Techniques and Common Trends in Recent Literature

de Faria, Ana Cláudia Akemi Matsuki, Bastos, Felype de Castro, da Silva, José Victor Nogueira Alves, Fabris, Vitor Lopes, Uchoa, Valeska de Sousa, Neto, Décio Gonçalves de Aguiar, Santos, Claudio Filipi Goncalves dos

Jun-2-2023–arXiv.org Artificial Intelligence

Visual Question Answering (VQA) is a multi-disciplinary artificial intelligence research problem that has attracted the attention of researchers from computer vision, natural language processing, knowledge representation, and other machine learning communities. To solve that question, VQA is a task of generating natural language answers when a question in natural language is asked related to an image. In recent years, visual question answering as a result of the flourish in this field, datasets, metrics, and models have been proposed, and the scope of research has been expanded. Although artificial intelligence has solved several different problems, such as image classification and natural language processing (NLP), it is hard to model a problem which needs different types of data. For instance, mixing computer vision with NLP to retrieve some information about an image from a question has tricked researchers for several years.

machine learning, natural language, question answering, (16 more...)

arXiv.org Artificial Intelligence

Jun-2-2023

arXiv.org PDF

Add feedback

Country:
- South America
  - Brazil (0.04)
  - Paraguay > Asunción
    - Asunción (0.04)
- North America
  - Dominican Republic (0.04)
  - United States
    - Texas > Travis County
      - Austin (0.04)
    - New York > New York County
      - New York City (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
- Europe
  - Switzerland (0.04)
  - Romania > București - Ilfov Development Region
    - Municipality of Bucharest > Bucharest (0.04)
  - Italy > Tuscany
    - Florence (0.04)
  - Greece > Central Macedonia
    - Thessaloniki (0.04)
- Asia
  - China > Hong Kong (0.04)
  - Middle East
    - Jordan (0.04)
    - Qatar > Ad-Dawhah
      - Doha (0.04)
    - Israel > Tel Aviv District
      - Tel Aviv (0.04)
  - Japan > Kyūshū & Okinawa
    - Kyūshū > Miyazaki Prefecture > Miyazaki (0.04)
- Africa > Central African Republic
  - Ombella-M'Poko > Bimbo (0.04)

Genre:
- Summary/Review (1.00)
- Research Report > New Finding (1.00)
- Overview (1.00)

Industry:
- Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Question Answering (1.00)
  - Cognitive Science > Problem Solving (0.88)
  - Machine Learning
    - Statistical Learning (1.00)
    - Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found