Agentic Reasoning for Robust Vision Systems via Increased Test-Time Compute

Chung-En, null, Yu, null, Jalaian, Brian, Bastian, Nathaniel D.

Sep-23-2025–arXiv.org Artificial Intelligence

Developing trustworthy intelligent vision systems for high-stakes domains, \emph{e.g.}, remote sensing and medical diagnosis, demands broad robustness without costly retraining. We propose \textbf{Visual Reasoning Agent (VRA)}, a training-free, agentic reasoning framework that wraps off-the-shelf vision-language models \emph{and} pure vision systems in a \emph{Think--Critique--Act} loop. While VRA incurs significant additional test-time computation, it achieves up to 40\% absolute accuracy gains on challenging visual reasoning benchmarks. Future work will optimize query routing and early stopping to reduce inference overhead while preserving reliability in vision tasks.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

Sep-23-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.69)

Genre:
- Research Report (0.82)

Industry:
- Health & Medicine > Diagnostic Medicine (0.35)
- Energy > Renewable
  - Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.37)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Natural Language > Large Language Model (0.30)