HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation
Raza, Shaina, Narayanan, Aravind, Khazaie, Vahid Reza, Vayani, Ashmal, Radwan, Ahmed Y., Chettiar, Mukund S., Singh, Amandeep, Shah, Mubarak, Pandya, Deval
–arXiv.org Artificial Intelligence
Although recent large multimodal models (LMMs) demonstrate impressive progress on vision language tasks, their alignment with human centered (HC) principles, such as fairness, ethics, inclusivity, empathy, and robustness; remains poorly understood. We present HumaniBench, a unified evaluation framework designed to characterize HC alignment across realistic, socially grounded visual contexts. HumaniBench contains 32,000 expert-verified image question pairs derived from real world news imagery and spanning seven evaluation tasks: scene understanding, instance identity, multiple-choice visual question answering (VQA), multilinguality, visual grounding, empathetic captioning, and image resilience testing. Each task is mapped to one or more HC principles through a principled operationalization of metrics covering accuracy, harmful content detection, hallucination and faithfulness, coherence, cross lingual quality, empathy, and robustness.We evaluate 15 state-of-the-art LMMs under this framework and observe consistent cross model trade offs: proprietary systems achieve the strongest performance on ethics, reasoning, and empathy, while open-source models exhibit superior visual grounding and resilience. All models, however, show persistent gaps in fairness and multilingual inclusivity. We further analyze the effect of inference-time techniques, finding that chain of thought prompting and test-time scaling yield 8 to 12 % improvements on several HC dimensions. HumaniBench provides a reproducible, extensible foundation for systematic HC evaluation of LMMs and enables fine-grained analysis of alignment trade-offs that are not captured by conventional multimodal benchmarks. https://vectorinstitute.github.io/humanibench/
arXiv.org Artificial Intelligence
Dec-1-2025
- Country:
- Asia > India (0.04)
- Europe > United Kingdom
- England > Oxfordshire > Oxford (0.04)
- North America
- Canada
- Alberta > Census Division No. 6
- Calgary Metropolitan Region > Calgary (0.04)
- Manitoba > Winnipeg Metropolitan Region
- Winnipeg (0.04)
- Ontario > Toronto (0.04)
- Quebec > Montreal (0.04)
- Saskatchewan > Saskatoon (0.04)
- Alberta > Census Division No. 6
- United States
- California
- Los Angeles County > Los Angeles (0.04)
- San Francisco County > San Francisco (0.04)
- Colorado (0.04)
- Florida > Orange County
- Orlando (0.04)
- Gulf of Mexico > Central GOM (0.04)
- Illinois > Cook County
- Chicago (0.04)
- Minnesota (0.04)
- New York > New York County
- New York City (0.04)
- Washington > King County
- Seattle (0.04)
- California
- Canada
- Oceania > Australia
- Australian Capital Territory > Canberra (0.04)
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Government (1.00)
- Health & Medicine (0.67)
- Information Technology (0.92)
- Law (1.00)
- Leisure & Entertainment > Sports (1.00)
- Media > News (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Issues > Social & Ethical Issues (0.67)
- Machine Learning > Neural Networks
- Deep Learning (1.00)
- Natural Language
- Chatbot (1.00)
- Large Language Model (1.00)
- Vision (1.00)
- Information Technology > Artificial Intelligence