Decompose and Compare Consistency: Measuring VLMs' Answer Reliability via Task-Decomposition Consistency Comparison

Yang, Qian, Yan, Weixiang, Agrawal, Aishwarya

Jul-11-2024–arXiv.org Artificial Intelligence

Despite tremendous advancements, current state-of-the-art Vision-Language Models (VLMs) are still far from perfect. They tend to hallucinate and may generate biased responses. In such circumstances, having a way to assess the reliability of a given response generated by a VLM is quite useful. Existing methods, such as estimating uncertainty using answer likelihoods or prompt-based confidence generation, often suffer from overconfidence. Other methods use self-consistency comparison but are affected by confirmation biases. To alleviate these, we propose \textbf{De}compose and \textbf{C}ompare \textbf{C}onsistency (\texttt{DeCC}) for reliability measurement. By comparing the consistency between the direct answer generated using the VLM's internal reasoning process, and the indirect answers obtained by decomposing the question into sub-questions and reasoning over the sub-answers produced by the VLM, \texttt{DeCC} measures the reliability of VLM's direct answer. Experiments across six vision-language tasks with three VLMs show \texttt{DeCC}'s reliability estimation achieves better correlation with task accuracy compared to the existing methods.

contradiction, entailment, vlm, (15 more...)

arXiv.org Artificial Intelligence

Jul-11-2024

arXiv.org PDF

Add feedback

Country:
- Asia > Singapore (0.04)
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.05)
- North America
  - United States > California
    - Santa Barbara County > Santa Barbara (0.04)
  - Canada
    - Nunavut (0.04)
    - Quebec > Montreal (0.04)
    - Ontario > Toronto (0.04)
- Europe > Ireland
  - Leinster > County Dublin > Dublin (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning (1.00)
  - Natural Language > Large Language Model (0.51)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found