VHELM: A Holistic Evaluation of Vision Language Models Chi Heem Wong
–Neural Information Processing Systems
Current benchmarks for assessing vision-language models (VLMs) often focus on their perception or problem-solving capabilities and neglect other critical aspects such as fairness, multilinguality, or toxicity. Furthermore, they differ in their evaluation procedures and the scope of the evaluation, making it difficult to compare models. To address these issues, we extend the HELM framework to VLMs to present the Holistic Evaluation of Vision Language Models (VHELM). VHELM aggregates various datasets to cover one or more of the 9 aspects: visual perception, knowledge, reasoning, bias, fairness, multilinguality, robustness, toxicity, and safety. In doing so, we produce a comprehensive, multi-dimensional view of the capabilities of the VLMs across these important factors.
Neural Information Processing Systems
Mar-27-2025, 16:28:10 GMT
- Country:
- North America > United States > California (0.28)
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Education > Educational Setting (0.46)
- Health & Medicine (1.00)
- Law (0.67)
- Technology: