Unveiling the Tapestry of Consistency in Large Vision-Language Models

Mar-22-2026, 15:32:44 GMT–Neural Information Processing Systems

Large vision-language models (LVLMs) have recently achieved rapid progress, exhibiting great perception and reasoning abilities concerning visual information. However, when faced with prompts in different sizes of solution spaces, LVLMs fail to always give consistent answers regarding the same knowledge point. This inconsistency of answers between different solution spaces is prevalent in LVLMs and erodes trust. To this end, we provide a multi-modal benchmark ConBench, to intuitively analyze how LVLMs perform when the solution space of a prompt revolves around a knowledge point. Based on the ConBench tool, we are the first to reveal the tapestry and get the following findings: (1) In the discriminate realm, the larger the solution space of the prompt, the lower the accuracy of the answers.

artificial intelligence, proceedings, solution space, (8 more...)

Neural Information Processing Systems

Mar-22-2026, 15:32:44 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence (0.82)