Knowledge-based Visual Question Answer with Multimodal Processing, Retrieval and Filtering

Jun-21-2026, 17:15:21 GMT–Neural Information Processing Systems

Knowledge-based visual question answering (KB-VQA) requires visual language models (VLMs) to integrate visual understanding with external knowledge retrieval. Although retrieval-augmented generation (RAG) achieves significant advances in this task by combining knowledge-base querying, it still struggles with the quality of multimodal queries and the relevance of retrieved results. To overcome these challenges, we propose a novel three-stage method, termed Wiki-PRF, including Processing, Retrieval and Filtering stages.

information, large language model, machine learning, (20 more...)

Neural Information Processing Systems

Jun-21-2026, 17:15:21 GMT

Conferences PDF

Add feedback

Country:
- Europe (1.00)
- North America
  - United States (0.68)
  - Canada (0.46)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Expert Systems (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.66)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found