EgoNormia: Benchmarking Physical Social Norm Understanding

Rezaei, MohammadHossein, Fu, Yicheng, Cuvin, Phil, Ziems, Caleb, Zhang, Yanzhe, Zhu, Hao, Yang, Diyi

Mar-5-2025–arXiv.org Artificial Intelligence

Human activity is moderated by norms. However, machines are often trained without explicit supervision on norm understanding and reasoning, especially when the norms are grounded in a physical and social context. To improve and evaluate the normative reasoning capability of vision-language models (VLMs), we present EgoNormia $\|\epsilon\|$, consisting of 1,853 ego-centric videos of human interactions, each of which has two related questions evaluating both the prediction and justification of normative actions. The normative actions encompass seven categories: safety, privacy, proxemics, politeness, cooperation, coordination/proactivity, and communication/legibility. To compile this dataset at scale, we propose a novel pipeline leveraging video sampling, automatic answer generation, filtering, and human validation. Our work demonstrates that current state-of-the-art vision-language models lack robust norm understanding, scoring a maximum of 45% on EgoNormia (versus a human bench of 92%). Our analysis of performance in each dimension highlights the significant risks of safety, privacy, and the lack of collaboration and communication capability when applied to real-world agents. We additionally show that through a retrieval-based generation method, it is possible to use EgoNormia to enhance normative reasoning in VLMs.

justification, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

Mar-5-2025

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Arizona (0.04)
    - California > Santa Clara County
      - Palo Alto (0.04)
  - Canada > Ontario
    - Toronto (0.14)
- Europe
  - Italy > Tuscany
    - Florence (0.04)
  - Germany > Bavaria
    - Upper Bavaria > Munich (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia
  - East Asia (0.04)
  - China > Hong Kong (0.04)
  - Thailand > Bangkok
    - Bangkok (0.04)

Genre:
- Research Report (0.82)

Industry:
- Information Technology > Security & Privacy (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Representation & Reasoning > Agents (1.00)
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found