Towards Unified Music Emotion Recognition across Dimensional and Categorical Models
Kang, Jaeyong, Herremans, Dorien
–arXiv.org Artificial Intelligence
--One of the most significant challenges in Music Emotion Recognition (MER) comes from the fact that emotion labels can be heterogeneous across datasets with regard to the emotion representation, including categorical (e.g., happy, sad) versus dimensional labels (e.g., valence-arousal). In this paper, we present a unified multitask learning framework that combines these two types of labels and is thus able to be trained on multiple datasets. This framework uses an effective input representation that combines musical features (i.e., key and chords) and MERT embeddings. Moreover, knowledge distillation is employed to transfer the knowledge of teacher models trained on individual datasets to a student model, enhancing its ability to generalize across multiple tasks. T o validate our proposed framework, we conducted extensive experiments on a variety of datasets, including MTG-Jamendo, DEAM, PMEmo, and EmoMusic. According to our experimental results, the inclusion of musical features, multitask learning, and knowledge distillation significantly enhances performance. In particular, our model outperforms the state-of-the-art models on the MTG-Jamendo dataset. Our work makes a significant contribution to MER by allowing the combination of categorical and dimensional emotion labels in one unified framework, thus enabling training across datasets. I NTRODUCTION Music plays an essential role in influencing human emotions [36]. In the past decades, numerous Music Emotion Recognition (MER) models been developed.
arXiv.org Artificial Intelligence
Feb-6-2025
- Country:
- Europe (0.28)
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Leisure & Entertainment (1.00)
- Media > Music (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Cognitive Science > Emotion (1.00)
- Machine Learning
- Neural Networks > Deep Learning (1.00)
- Performance Analysis > Accuracy (1.00)
- Transfer Learning (0.91)
- Natural Language > Large Language Model (0.68)
- Information Technology > Artificial Intelligence