A survey on VQA_Datasets and Approaches
–arXiv.org Artificial Intelligence
Visual question answering (VQA) is a task that combines both the techniques of computer vision and natural language processing. It requires models to answer a text-based question according to the information contained in a visual. In recent years, the research field of VQA has been expanded. Research that focuses on the VQA, examining the reasoning ability and VQA on scientific diagrams, has also been explored more. Meanwhile, more multimodal feature fusion mechanisms have been proposed. This paper will review and analyze existing datasets, metrics, and models proposed for the VQA task.
arXiv.org Artificial Intelligence
May-2-2021
- Country:
- North America > United States
- Iowa > Johnson County > Iowa City (0.04)
- Asia > China
- Shaanxi Province > Xi'an (0.04)
- North America > United States
- Genre:
- Research Report (0.50)
- Industry:
- Leisure & Entertainment (0.93)
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Machine Learning > Neural Networks (0.68)
- Natural Language > Question Answering (0.55)
- Information Technology > Artificial Intelligence