HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation
Raza, Shaina, Narayanan, Aravind, Khazaie, Vahid Reza, Vayani, Ashmal, Radwan, Ahmed Y., Chettiar, Mukund S., Singh, Amandeep, Shah, Mubarak, Pandya, Deval
–arXiv.org Artificial Intelligence
Although recent large multimodal models (LMMs) demonstrate impressive progress on vision language tasks, their alignment with human centered (HC) principles, such as fairness, ethics, inclusivity, empathy, and robustness; remains poorly understood. We present HumaniBench, a unified evaluation framework designed to characterize HC alignment across realistic, socially grounded visual contexts. HumaniBench contains 32,000 expert-verified image question pairs derived from real world news imagery and spanning seven evaluation tasks: scene understanding, instance identity, multiple-choice visual question answering (VQA), multilinguality, visual grounding, empathetic captioning, and image resilience testing. Each task is mapped to one or more HC principles through a principled operationalization of metrics covering accuracy, harmful content detection, hallucination and faithfulness, coherence, cross lingual quality, empathy, and robustness.We evaluate 15 state-of-the-art LMMs under this framework and observe consistent cross model trade offs: proprietary systems achieve the strongest performance on ethics, reasoning, and empathy, while open-source models exhibit superior visual grounding and resilience. All models, however, show persistent gaps in fairness and multilingual inclusivity. We further analyze the effect of inference-time techniques, finding that chain of thought prompting and test-time scaling yield 8 to 12 % improvements on several HC dimensions. HumaniBench provides a reproducible, extensible foundation for systematic HC evaluation of LMMs and enables fine-grained analysis of alignment trade-offs that are not captured by conventional multimodal benchmarks. https://vectorinstitute.github.io/humanibench/
arXiv.org Artificial Intelligence
Dec-1-2025
- Country:
- North America
- United States (1.00)
- Canada (1.00)
- North America
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Media > News (1.00)
- Leisure & Entertainment > Sports (1.00)
- Law (1.00)
- Government (1.00)
- Information Technology (0.92)
- Health & Medicine (0.67)
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Issues > Social & Ethical Issues (0.67)
- Natural Language
- Large Language Model (1.00)
- Chatbot (1.00)
- Machine Learning > Neural Networks
- Deep Learning (1.00)
- Information Technology > Artificial Intelligence