VHELM: A Holistic Evaluation of Vision Language Models

Mar-22-2026, 22:52:54 GMT–Neural Information Processing Systems

Current benchmarks for assessing vision-language models (VLMs) often focus on their perception or problem-solving capabilities and neglect other critical aspects such as fairness, multilinguality, or toxicity. Furthermore, they differ in their evaluation procedures and the scope of the evaluation, making it difficult to compare models. To address these issues, we extend the HELM framework to VLMs to present the Holistic Evaluation of Vision Language Models (VHELM). VHELM aggregates various datasets to cover one or more of the 9 aspects:,,,,,,,, and . In doing so, we produce a comprehensive, multi-dimensional view of the capabilities of the VLMs across these important factors.

artificial intelligence, natural language, proceedings, (9 more...)

Neural Information Processing Systems

Mar-22-2026, 22:52:54 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Natural Language (0.90)