MAVEN: Multi-modal Attention for Valence-Arousal Emotion Network
Ahire, Vrushank, Shah, Kunal, Khan, Mudasir Nazir, Pakhale, Nikhil, Sookha, Lownish Rai, Ganaie, M. A., Dhall, Abhinav
–arXiv.org Artificial Intelligence
This paper introduces MAVEN (Multi-modal Attention for Valence-Arousal Emotion Network), a novel architecture for dynamic emotion recognition through dimensional modeling of affect. The model uniquely integrates visual, audio, and textual modalities via a bi-directional cross-modal attention mechanism with six distinct attention pathways, enabling comprehensive interactions between all modality pairs. Our proposed approach employs modality-specific encoders to extract rich feature representations from synchronized video frames, audio segments, and transcripts. The architecture's novelty lies in its cross-modal enhancement strategy, where each modality representation is refined through weighted attention from other modalities, followed by self-attention refinement through modality-specific encoders. Rather than directly predicting valence-arousal values, MAVEN predicts emotions in a polar coordinate form, aligning with psychological models of the emotion circumplex. Experimental evaluation on the Aff-Wild2 dataset demonstrates the effectiveness of our approach, with performance measured using Concordance Correlation Coefficient (CCC). The multi-stage architecture demonstrates superior ability to capture the complex, nuanced nature of emotional expressions in conversational videos, advancing the state-of-the-art (SOTA) in continuous emotion recognition in-the-wild. Code can be found at: https://github.com/Vrushank-Ahire/MAVEN_8th_ABAW.
arXiv.org Artificial Intelligence
Mar-16-2025
- Genre:
- Research Report (0.82)
- Industry:
- Technology:
- Information Technology > Artificial Intelligence
- Cognitive Science > Emotion (1.00)
- Machine Learning
- Neural Networks > Deep Learning (1.00)
- Statistical Learning (1.00)
- Representation & Reasoning (1.00)
- Vision (1.00)
- Information Technology > Artificial Intelligence