MAGIC-VQA: Multimodal And Grounded Inference with Commonsense Knowledge for Visual Question Answering

Yang, Shuo, Luo, Siwen, Han, Soyeon Caren, Hovy, Eduard

Mar-24-2025–arXiv.org Artificial Intelligence

Visual Question Answering (VQA) requires reasoning across visual and textual modalities, yet Large Vision-Language Models (LVLMs) often lack integrated commonsense knowledge, limiting their robustness in real-world scenarios. To address this, we introduce MAGIC-VQA, a novel framework that enhances VQA by systematically integrating commonsense knowledge with LVLMs. MAGIC-VQA employs a three-stage process: (1) Explicit Knowledge Integration from external sources, (2) By-Type Post-Processing for contextual refinement, and (3) Implicit Knowledge Augmentation using a Graph Neural Network (GNN) for structured reasoning. While GNNs bring greater depth to structured inference, they enable superior relational inference beyond LVLMs. MAGIC-VQA bridges a key gap by unifying commonsensse knowledge with LVLM-driven reasoning, eliminating the need for extensive pre-training or complex prompt tuning. Our framework achieves state-of-the-art performance on benchmark datasets, significantly improving commonsense reasoning in VQA.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Mar-24-2025

arXiv.org PDF

Add feedback

Country:
- Atlantic Ocean (0.04)
- Pacific Ocean (0.04)
- Indian Ocean (0.04)
- Oceania > Australia
  - Western Australia (0.04)
- North America > United States
  - Virginia (0.04)
  - North Carolina (0.04)
- Africa > Guinea
  - Kankan Region > Kankan Prefecture > Kankan (0.04)

Genre:
- Research Report (0.64)

Industry:
- Education (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Commonsense Reasoning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.71)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found