GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks

Zhang, Xinlu, Lu, Yujie, Wang, Weizhi, Yan, An, Yan, Jun, Qin, Lianke, Wang, Heng, Yan, Xifeng, Wang, William Yang, Petzold, Linda Ruth

Nov-2-2023–arXiv.org Artificial Intelligence

Automatically evaluating vision-language tasks is challenging, especially when it comes to reflecting human judgments due to limitations in accounting for fine-grained details. Although GPT-4V has shown promising results in various multi-modal tasks, leveraging GPT-4V as a generalist evaluator for these tasks has not yet been systematically explored. We comprehensively validate GPT-4V's capabilities for evaluation purposes, addressing tasks ranging from foundational image-to-text and text-to-image synthesis to high-level image-to-image translations and multi-images to text alignment. We employ two evaluation methods, single-answer grading and pairwise comparison, using GPT-4V. Notably, GPT-4V shows promising agreement with humans across various tasks and evaluation methods, demonstrating immense potential for multi-modal LLMs as evaluators. Despite limitations like restricted visual clarity grading and real-world complex reasoning, its ability to provide human-aligned scores enriched with detailed explanations is promising for universal automatic evaluator.

evaluation, evaluator, gpt-4v, (16 more...)

arXiv.org Artificial Intelligence

Nov-2-2023

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.04)
- North America
  - Dominican Republic (0.04)
  - United States
    - Michigan > Washtenaw County
      - Ann Arbor (0.04)
    - California
      - Santa Barbara County > Santa Barbara (0.04)
      - San Diego County > San Diego (0.04)
- Europe
  - Switzerland > Zürich
    - Zürich (0.14)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
- Africa > Middle East
  - Egypt (0.04)

Genre:
- Research Report (1.00)

Industry:
- Leisure & Entertainment (0.93)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (1.00)
  - Artificial Intelligence
    - Vision (1.00)
    - Natural Language > Large Language Model (1.00)
    - Machine Learning > Neural Networks
      - Deep Learning (0.30)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found