MMER: Multimodal Multi-task Learning for Speech Emotion Recognition

Ghosh, Sreyan, Tyagi, Utkarsh, Ramaneswaran, S, Srivastava, Harshvardhan, Manocha, Dinesh

Jun-3-2023–arXiv.org Artificial Intelligence

In this paper, we propose MMER, a novel Multimodal Multi-task learning approach for Speech Emotion Recognition. MMER leverages a novel multimodal network based on early-fusion and cross-modal self-attention between text and acoustic modalities and solves three novel auxiliary tasks for learning emotion recognition from spoken utterances. In practice, MMER outperforms all our baselines and achieves state-of-the-art performance on the IEMOCAP benchmark. Additionally, we conduct extensive ablation studies and results analysis to prove the effectiveness of our proposed approach.

artificial intelligence, machine learning, recognition, (12 more...)

arXiv.org Artificial Intelligence

Jun-3-2023

arXiv.org PDF

Add feedback

Country:
- Asia > India (0.28)
- North America > United States (0.28)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Cognitive Science > Emotion (0.94)
  - Machine Learning > Neural Networks
    - Deep Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found