Anatomy of a Feeling: Narrating Embodied Emotions via Large Vision-Language Models
Saim, Mohammad, Duong, Phan Anh, Luong, Cat, Bhanderi, Aniket, Jiang, Tianyu
–arXiv.org Artificial Intelligence
The embodiment of emotional reactions from body parts contains rich information about our affective experiences. We propose a framework that utilizes state-of-the-art large vision-language models (LVLMs) to generate Embodied LVLM Emotion Narratives (ELENA). These are well-defined, multi-layered text outputs, primarily comprising descriptions that focus on the salient body parts involved in emotional reactions. We also employ attention maps and observe that contemporary models exhibit a persistent bias towards the facial region. Despite this limitation, we observe that our employed framework can effectively recognize embodied emotions in face-masked images, outperforming baselines without any fine-tuning. ELENA opens a new trajectory for embodied emotion analysis across the modality of vision and enriches modeling in an affect-aware setting.
arXiv.org Artificial Intelligence
Sep-25-2025
- Genre:
- Research Report > New Finding (0.68)
- Industry:
- Technology:
- Information Technology > Artificial Intelligence
- Cognitive Science > Emotion (1.00)
- Machine Learning > Neural Networks
- Deep Learning (0.71)
- Natural Language > Large Language Model (0.72)
- Vision (1.00)
- Information Technology > Artificial Intelligence