EmoSphere-SER: Enhancing Speech Emotion Recognition Through Spherical Representation with Auxiliary Classification

Cho, Deok-Hyeon, Oh, Hyung-Seok, Kim, Seung-Bin, Lee, Seong-Whan

Oct-20-2025–arXiv.org Artificial Intelligence

Speech emotion recognition predicts a speaker's emotional state from speech signals using discrete labels or continuous dimensions such as arousal, valence, and dominance (V AD). We propose EmoSphere-SER, a joint model that integrates spherical V AD region classification to guide V AD regression for improved emotion prediction. In our framework, V AD values are transformed into spherical coordinates that are divided into multiple spherical regions, and an auxiliary classification task predicts which spherical region each point belongs to, guiding the regression process. Additionally, we incorporate a dynamic weighting scheme and a style pooling layer with multi-head self-attention to capture spectral and temporal dynamics, further boosting performance. This combined training strategy reinforces structured learning and improves prediction consistency. Experimental results show that our approach exceeds baseline methods, confirming the validity of the proposed framework.

artificial intelligence, machine learning, representation, (15 more...)

arXiv.org Artificial Intelligence

Oct-20-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.88)

Industry:
- Health & Medicine (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Cognitive Science > Emotion (1.00)
  - Machine Learning > Statistical Learning (0.94)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found