Enhancing LLM Reasoning via Vision-Augmented Prompting

May-29-2025, 01:01:56 GMT–Neural Information Processing Systems

Verbal and visual-spatial information processing are two critical subsystems that activate different brain regions and often collaborate together for cognitive reasoning. Despite the rapid advancement of LLM-based reasoning, the mainstream frameworks, such as Chain-of-Thought (CoT) and its variants, primarily focus on the verbal dimension, resulting in limitations in tackling reasoning problems with visual and spatial clues. To bridge the gap, we propose a novel dual-modality reasoning framework called Vision-Augmented Prompting (VAP).

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

May-29-2025, 01:01:56 GMT

Conferences PDF

Add feedback

Country:
- Europe > Austria
  - Vienna (0.14)
- North America > United States (0.28)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (1.00)
- Workflow (1.00)

Industry:
- Information Technology > Security & Privacy (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Cognitive Science (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Representation & Reasoning (1.00)
  - Vision (1.00)