Prompting Large Language Models with Rationale Heuristics for Knowledge-based Visual Question Answering

Hu, Zhongjian, Yang, Peng, Li, Bing, Liu, Fengyuan

Dec-22-2024–arXiv.org Artificial Intelligence

Recently, Large Language Models (LLMs) have been used for knowledge-based Visual Question Answering (VQA). Despite the encouraging results of previous studies, prior methods prompt LLMs to predict answers directly, neglecting intermediate thought processes. We argue that prior methods do not sufficiently activate the capacities of LLMs. We propose a framework called PLRH that Prompts LLMs with Rationale Heuristics for knowledge-based VQA. The PLRH prompts LLMs with Chain of Thought (CoT) to generate rationale heuristics, i.e., intermediate thought processes, and then leverages the rationale heuristics to inspire LLMs to predict answers. Experiments show that our approach outperforms the existing baselines by more than 2.2 and 2.1 on OK-VQA and A-OKVQA, respectively.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

Dec-22-2024

arXiv.org PDF

Add feedback

Country:
- North America (0.28)

Genre:
- Research Report (0.64)

Industry:
- Leisure & Entertainment > Sports (0.47)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found