MMER: Multimodal Multi-task Learning for Speech Emotion Recognition
Ghosh, Sreyan, Tyagi, Utkarsh, Ramaneswaran, S, Srivastava, Harshvardhan, Manocha, Dinesh
–arXiv.org Artificial Intelligence
In this paper, we propose MMER, a novel Multimodal Multi-task learning approach for Speech Emotion Recognition. MMER leverages a novel multimodal network based on early-fusion and cross-modal self-attention between text and acoustic modalities and solves three novel auxiliary tasks for learning emotion recognition from spoken utterances. In practice, MMER outperforms all our baselines and achieves state-of-the-art performance on the IEMOCAP benchmark. Additionally, we conduct extensive ablation studies and results analysis to prove the effectiveness of our proposed approach.
arXiv.org Artificial Intelligence
Jun-3-2023
- Country:
- Asia > India (0.28)
- North America > United States (0.28)
- Genre:
- Research Report (0.40)
- Technology: