MMER: Multimodal Multi-task Learning for Speech Emotion Recognition

Ghosh, Sreyan, Tyagi, Utkarsh, Ramaneswaran, S, Srivastava, Harshvardhan, Manocha, Dinesh

arXiv.org Artificial Intelligence 

In this paper, we propose MMER, a novel Multimodal Multi-task learning approach for Speech Emotion Recognition. MMER leverages a novel multimodal network based on early-fusion and cross-modal self-attention between text and acoustic modalities and solves three novel auxiliary tasks for learning emotion recognition from spoken utterances. In practice, MMER outperforms all our baselines and achieves state-of-the-art performance on the IEMOCAP benchmark. Additionally, we conduct extensive ablation studies and results analysis to prove the effectiveness of our proposed approach.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found