Exploring Diverse Methods in Visual Question Answering

Li, Panfeng, Yang, Qikai, Geng, Xieming, Zhou, Wenjing, Ding, Zhicheng, Nian, Yi

May-20-2024–arXiv.org Artificial Intelligence

This study explores innovative methods for improving Visual Question Answering (VQA) using Generative Adversarial Networks (GANs), autoencoders, and attention mechanisms. Leveraging a balanced VQA dataset, we investigate three distinct strategies. Firstly, GAN-based approaches aim to generate answer embeddings conditioned on image and question inputs, showing potential but struggling with more complex tasks. Secondly, autoencoder-based techniques focus on learning optimal embeddings for questions and images, achieving comparable results with GAN due to better ability on complex questions. Lastly, attention mechanisms, incorporating Multimodal Compact Bilinear pooling (MCB), address language priors and attention modeling, albeit with a complexity-performance trade-off. This study underscores the challenges and opportunities in VQA and suggests avenues for future research, including alternative GAN formulations and attentional mechanisms.

generator, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

May-20-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States > Illinois (0.47)

Genre:
- Research Report > Promising Solution (0.66)

Industry:
- Energy > Oil & Gas
  - Upstream (0.46)
- Health & Medicine (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.69)
  - Natural Language (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found