Sherlock: Self-Correcting Reasoning in Vision-Language Models

Jun-13-2026, 07:58:39 GMT–Neural Information Processing Systems

Reasoning Vision-Language Models (VLMs) have shown promising performance on complex multimodal tasks. However, they still face significant challenges: they are highly sensitive to reasoning errors, require large volumes of annotated data or accurate verifiers, and struggle to generalize beyond specific domains. To address these limitations, we explore self-correction as a strategy to enhance reasoning VLMs. We first conduct an in-depth analysis of reasoning VLMs' self-correction abilities and identify key gaps. Based on our findings, we introduce \emph{Sherlock}, a self-correction and self-improvement training framework.

artificial intelligence, natural language, proceedings, (6 more...)

Neural Information Processing Systems

Jun-13-2026, 07:58:39 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Natural Language (0.46)